Medical Databases
Although the focus of many investigators is on sequence-based data, database cataloging and organizing sequence information are not the only kinds of databases useful to the biomedical research community. An excellent example of such a database that is tremendously useful in genomics is called Online Mendelian Inheritance in Man (OMIM), the electronic version of the venerable catalog of human genes and genetic disorders originally founded by Victor McKusick and first published in 1966 (McKusick 1966, 1998; Amberger et al. 2014). OMIM, which is authored and maintained at The Johns Hopkins University School of Medicine, provides concise textual information from the published literature on most human conditions having a genetic basis, as well as pictures illustrating the condition or disorder (where appropriate), full citation information, and links to a number of useful external resources, some of which will be described below. As will become obvious through the following example, a basic knowledge of OMIM should be part of the armamentarium of physician-scientists with an interest in the clinical aspects of genetic disorders.
Figure 2.12 A list of structures deemed similar to pdb:4URT using VAST+. The table is sorted by the root-mean-square deviation of all aligned residues (in Å), from smallest to largest. Details on each individual structure in the list can be found by clicking on its Protein Data Bank (PDB) ID number.
OMIM has a defined numbering system in which each entry is assigned a unique number – a “MIM number” – that is similar to an accession number, with certain positions within that number indicating information about the genetic disorder itself. The first digit represents the mode of inheritance of the disorder: 1, 2, and 6 stand for autosomal loci or phenotypes, 3 for X-linked loci or phenotype, 4 for Y-linked loci or phenotype, and 5 for mitochondrial loci or phenotypes. An asterisk (*) preceding a MIM number indicates a gene, a hash sign (#) indicates an entry describing a phenotype, a plus sign (+) indicates that the entry describes a gene of known sequence and phenotype, and a percent sign (%) describes a confirmed Mendelian phenotype or locus for which the underlying molecular basis is unknown. If no Mendelian basis has been clearly established for a particular entry, no symbol precedes the MIM number.
Figure 2.13 Online Mendelian Inheritance in Man (OMIM) entries related to the DCC gene. The hash sign (#) preceding the first entry indicates that it is an entry describing a phenotype – here, mirror movements. The second entry is preceded by an asterisk (*), indicating that it is a gene entry – here, for the DCC gene.
Here, we will continue the Entrez example from the previous section, following the OMIM (cited) link found in the Discovery Column shown in Figure 2.3. An intermediate landing page will then appear listing two entries, one for the DCC gene, the other for a phenotype entry describing mirror movements (Figure 2.13). Clicking on the second entry leads the user to the OMIM page for the DCC gene shown in Figure 2.14, with the Text section of the entry providing a comprehensive overview of seminal details regarding the identification of the gene, its structure, relevant biochemical features, mapping information, an overview of the gene's function and molecular genetics, and studies involving animal models. For individuals starting work on a new gene or genetic disorder, this expertly curated section of the OMIM entry should be considered “required reading,” as it presents the most important aspects of any given gene, with links to the original studies cited within the narrative embedded throughout. A particularly useful feature is the list of allelic variants (Figure 2.15); a short description is given after each allelic variant of the clinical or biochemical outcome of that particular mutation. At the time of this writing, there are over 5200 OMIM entries containing at least one allelic variant that either causes or is associated with a discrete phenotype in humans. Note that the allelic variants shown in Figure 2.15 produce significantly different clinical outcomes – two different types of cancer as well as the motor disorder used throughout this example – an interesting case where different mutations in the same gene lead to distinct genetic disorders.
Figure 2.14 The Online Mendelian Inheritance in Man (OMIM) entry for the DCC gene. Each entry in OMIM includes information such as the gene symbol, alternative names for the disease, a description of the disease, a clinical synopsis, and references. See text for details.
The studies leading to these and similar observations described in a typical entry often provide the foundation for clinical trials aimed at translating this knowledge into new prevention and treatment strategies. NIH's central information source for clinical trials, aptly named ClinicalTrials.gov, contains data on both publicly and privately funded clinical trials being conducted worldwide. Figure 2.16 shows the first eight of more than 4600 clinical trials actively recruiting patients with colorectal cancer at the time of this writing, and clicking on the name of a protocol will bring the user to a page providing information on the study, including the principal investigator's name and contact information. Clicking the On Map tab at the top of the page produces a clickable map of the world showing how many clinical trials are being conducted in each region or country (Figure 2.17); this view is useful in identifying trials that are geographically close to a potential study subject's home. While we, as scientists, tend to focus on the types of information discussed throughout the rest of this chapter, the clinical trials site is, unarguably, the most important of the sites covered in this chapter, as it provides a means through which patients with a given genetic or metabolic disorder can receive the latest, cutting-edge treatment – treatment that may make a substantial difference to their quality of life.
Figure 2.15 An example of a list of allelic variants that can be found through Online Mendelian Inheritance in Man (OMIM). The figure shows three of the four allelic variants for the DCC gene. Two of the documented variants lead to cancers of the digestive tract, while two are associated with a movement disorder. The description under each allelic variant provides information specific to that particular mutation.