Fig. 4.2. This figure is taken from Marklund et al. (1996) and shows the alignment for two alleles in a region of the gene MC1R, the variant in codon 83 responsible for the difference between the E and e allele and the resulting amino acid change at position 83 from serine to phenylalanine (the amino acids are listed in the figure using three-letter codes for each amino acid).
This is called a non-synonymous mutation because it changes an amino acid. Furthermore, the change is chemically significant because phenylalanine is hydrophobic while serine is hydrophilic. Changing the amino acids in this position destroys the binding site of this receptor and, as a consequence, it cannot interact with melanocyte-stimulating hormone to create black pigment. The default pigment is red. When non-synonymous variants are found, one of the major questions is what impact this may have on gene function. Proof could come from doing gene editing and cell biology experiments. However, these experiments are costly, time-consuming, and not justifiable for every genetic variant that is discovered. Therefore, scientists turn to computer modeling to make predictions about the effects of a mutation.
Several of the more popular programs are SIFT (Kumar et al., 2009) and Polyphen-2 (Adzhubei et al., 2010) for simple substitutions and Provean for simple substitutions, deletions, and insertions (Choi et al., 2012). These programs assess the likelihood that changes in amino acids will alter protein function based on the physical and chemical properties of the amino acids. Scientists can enter the different series of amino acids associated with the two variants and the programs will return a prediction as to whether the variant will have no effect, a possible deleterious effect, or a probable deleterious effect.
Mutation versus variant
So far, we have used the terms “variant” and “mutation” interchangeably. This is common in genetics although the terms do have subtly different meanings. Variation denotes that different forms exist for a gene. A major theme for this book is to understand and appreciate the extent of genetic variation in the horse. The term “mutation” is a charged term, carrying the concept of normal DNA sequences versus those which are not normal. In the early 1900s scientists coined a term “wildtype” to denote what they considered to be the normal variant in a population. The alternative was mutation. The term wildtype is still occasionally used. The term “mutation” is frequently used when referring to disease-causing variants. In any case, the term mutation is appropriate at the moment when the mutational event is observed, for example, as a change in DNA sequence between parents and offspring.
While we now think of the chestnut allele as a variant, studies of ancient DNA plus studies of the function of the MC1R protein demonstrated that the ancestors of modern horses had black pigment and a mutation of the MC1R gene led to the creation of the chestnut variant. As noted in earlier chapters, early breeders liked some of these color variants and selected for them such that they have become characteristic of breeds. Chestnut is very common among Saddlebred horses. Likewise, gray coat color is the consequence of an ancient mutation but now very common among Arabian horses and Lippizan horses. We will follow the practice of calling the alleles for chestnut and gray as variants and not mutations although their origins were as mutations of what could be called the wildtype alleles.
Other roles of DNA: introns and exons, and regulatory DNA sequences
All DNA sequences do not code for amino acids. The sections that code for amino acids are called exons. Most proteins are encoded by (i.e. transcribed and translated from) multiple exons separated by DNA sequences called introns. Together, exons and introns comprise what we have traditionally called genes. When DNA is transcribed into RNA, the entire section of introns and exons is made into RNA. Next, editing enzymes process the RNA strand, clipping out introns to make the final transcript. The final transcript, called messenger RNA (mRNA) is used for translation of the message into the amino acid sequence in the protein. For example, the gene for TRPM1 spans 103,840 bases on chromosome 1 when including all exons and introns. After processing, TRPM1, has five possible mRNA transcripts based on alternate splicing of 24–26 exons with the final transcript ranging in length from 5459 bases to 5630 bases. Fig. 4.3 illustrates a hypothetical intron-exon structure of a gene and the resulting mRNA following transcription and RNA processing to remove 3 introns separating 4 exons.
Fig. 4.3. This image illustrates a hypothetical gene with four exons and three introns. The introns can be very large and the exons very small. The entire exon/intron structure is transcribed then the introns are removed. Following transcription and mRNA processing, only the exons remain.
The most common signal to begin or conclude splicing out a section of DNA occurs at the beginning of the intron (bases GT) and the end of the intron (bases AG). A mutation at the first site will cause inclusion of the intron sequence in the processed mRNA while a mutation at the second will cause deletion of the next exon. Therefore, it is important to consider intron DNA sequences as well as exon DNA sequences when looking for genetic variants that affect genes. This type of variant is called alternate splicing and is implicated in some human diseases and has been identified as the cause of the sabino1 coat color variant in horses. In addition, alternate splicing appears to be normal in the function of many cells and tissues. Various regulatory factors can mask existing splice sites and cause alternate splicing for most genes. We are still learning how this occurs. However, we believe that the creation of proteins with slightly different domains creates proteins with slightly different functions.
In addition to introns and exons, there are large stretches of DNA between genes called, intergenic DNA that separate exon/intron regions. We do not know the role for DNA in these regions, however this DNA is probably important for maintaining special relationships among DNA chromosomal elements and much of it is actually transcribed and may play a role in regulating transcription.
Examples of DNA changes affecting genes
Nucleotide substitution changing an amino acid
These include variation in the coat color genes Extension or MC1R, aka black/bay/chestnut (Marklund et al., 1996; see Chapter 7), Cream dilution (Mariat et al., 2003; Chapter 8), and Champagne dilution (Cook et al., 2008; Chapter 8). These three coat color genes show genetic variation because of a SNP in the gene that changes an amino acid making up the protein. These would be called synonymous variants since they are found in exons. Changes in amino acids can alter receptor function or disable enzymatic function.
Deletion/loss of DNA resulting in loss of the codon reading frame
Lavender foal syndrome (Brooks et al., 2010) and severe combined immunodeficiency syndrome (Shin et al., 1997) both involve deletions in the coding sequence for a protein which destroys the function of that protein by shifting the reading frame