2.3. Substitution(s) in protein structures
The substitution of one amino acid by another (a mutation) is one of the fundamental events of molecular evolution, with variable consequences in proteins. The majority of mutations have no major effect on the phenotype – they are neutral – but some of them can cause disease (Studer et al. 2013). Indeed, local changes may occur in binding sites with other molecules and can thus affect the function of proteins (Gong et al. 2009), but long-term effects on the overall structure can also be observed (Zhou et al. 2007).
Many studies agree that the majority of substitutions have no significant effect on the overall structure, stability or function of the protein. As a matter of fact, it has been shown that 75% of the amino acids can be modified without significant alteration of the protein structure (Sander and Schneider 1991; Shakhnovich and Gutin 1991; Schaefer and Rost 2012). These observations support the neutral hypothesis of point mutations, but it is important to keep in mind that this does not mean that all mutations are neutral: the majority of point mutations are effectively counter-selected because their impact is negative for the cell. For example, the probability that a human DNA repair enzyme, 3-methyladenine DNA glycosylase, becomes non-functional after a random mutation is 34% (± 6%), and this proportion can be extended to other families (Guo et al. 2004).
2.4. Effect on overall structure and function
From a structural point of view, the effect of an amino acid substitution is difficult to model because the effects can be drastic. A striking example is the L16A mutation (leucine 16 replaced by alanine) of a DNA-binding protein of Drosophila melanogaster, located in the homeodomain, which profoundly modifies the structure of the native protein while maintaining the three helices (Religa et al. 2005) (Figure 2.3).
However, the link between the structural change and functionality, or disease is not obvious or systematic. For example, in the case of human lysozymes with many known structures, two natural mutants, D67H and I56T, form amyloid fibrils in the extracellular space of multiple organs and tissues, resulting in non-neuropathic systemic amyloidosis. In the case of the D67H mutation (Figure 2.4(a)), the structure of the lysozyme is highly disrupted in two of its loops, while the structure of the I56T mutant (Figure 2.4(b)) appears little disrupted, as shown by the structures of the mutants superimposed on that of the native lysozyme. Nevertheless, without attempting to predict whether a mutation will be pathogenic or not, the modeling of the effect of a mutation on a protein has been proposed by assessing the variation in its stability.
Figure 2.3. a) DNA binding protein of Drosophila melanogaster (PDB code 1enh); b) L16A mutant of the same protein (PDB code 1ztr). The mutated position is shown with its side chain. For a color version of this figure, see www.iste.co.uk/grandcolas/systematics.zip
Figure 2.4. a) Superimposed structures of the D67H mutant (green, PDB code 1lyy) and the native structure (blue, PDB code 2nwd) of the human lysozyme; b) superimposed structures of the I56T mutant (orange, PDB code 1loz) and the native structure (blue, PDB code 2nwd) of the human lysozyme. The mutated positions are represented with their side chains. For a color version of this figure, see www.iste.co.uk/grandcolas/systematics.zip
2.5. Effect on stability
Proteins are said to be “marginally stable”: typically, there is a difference of 3–7 kcal/mol in free folding energy (ΔG) between folded and unfolded conformations. Amino acid side chain substitutions thus have a significant effect on protein stability: the effect of a single mutation is on average −0.95 kcal/mol according to the Protherm database (Gromiha and Sarai 2010), which in 2017 included the measurements from ΔΔG for 1,866 proteins with their structure. ΔΔG is the difference between the free folding energy of the native and the mutated protein.
This low protein stability is assumed to be either the result of a balance between function and stability (DePristo et al. 2005), or the result of a balance between destabilizing mutations and highly unstable proteins (Taverna and Goldstein 2002; Bloom et al. 2007; Zeldovich et al. 2007).
Two principles must be kept in mind:
– function is often dependent on a dynamic effect of structure;
– any protein must be degradable at a cost that is not restrictive for the cell.
Many algorithms and web servers have been developed to provide an estimate of the variation of Gibbs’ free energy (ΔΔG) under the effect of a point mutation: FoldX (Guerois et al. 2002) and Rosetta (Kellogg et al. 2011) are among the best known. SPROUTS (Lonquety et al. 2009) is a web server combining the results of several methods. These methods try to predict whether a given mutation will be destabilizing, neutral or stabilizing. The comparison between predicted and experimental energy variations gives globally satisfactory results (Figure 2.5(a)): the correlation between the predictions of FoldX and the measurements of Protherm is 0.59. Nevertheless, the difference can be quite large in some cases (Lonquety et al. 2009). It is interesting to note that the prediction results are quite different when comparing wild to mutant and mutant to wild, in absolute value (Figure 2.5(b)).
Figure 2.5. Comparison of predicted ΔΔG by FoldX and those experimentally measured (Protherm)
COMMENT ON FIGURE 2.5. – a) Prediction by FoldX of ΔΔG for 130 proteins present in Protherm and belonging to the 11 families in which at least 20 different point mutant structures are known (see section 2.6 for a description of this dataset). b) Prediction by FoldX of ΔΔG for the families in which at least 20 different point mutant structures are known. The abscissa shows the predicted value of ΔΔG from the