Bioinformatics. Группа авторов. Читать онлайн. Newlib. NEWLIB.NET

Автор: Группа авторов
Издательство: John Wiley & Sons Limited
Серия:
Жанр произведения: Биология
Год издания: 0
isbn: 9781119335955
Скачать книгу
the two sequences to be compared are specified in advance by the user. The method is particularly useful for comparing sequences that have been determined to be homologous through experimental methods or for making comparisons between sequences from different species. Returning to the Protein BLAST (BLASTP) search page shown in Figure 3.6, checking the box marked “Align two or more sequences” will change the structure of the page, now allowing for the user to enter both the query and subject sequences that will be compared with one another (Figure 3.11). As with any BLAST search, the user can adjust the standard array of BLAST-related options, including the selection of scoring matrix and gap penalties. A sample of the results produced by the BLAST 2 Sequences method is shown in Figure 3.12, comparing the transcription factor SOX-1 from H. sapiens and the ctenophore Mnemiopsis leidyi, the earliest branching animal species dating back at least 500 million years in evolutionary time (Ryan et al. 2013; Schnitzler et al. 2014). The major difference between this output and the typical BLAST output is the inclusion of a dot matrix view of the alignment, or “dotplot.” Dotplots are intended to provide a graphical representation of the degree of similarity between the two sequences being compared, allowing for the quick identification of regions of local alignment, direct or inverted repeats, insertions, deletions, and low-complexity regions. The dotplot in Figure 3.12 indicates two regions of alignment, and additional information on those two regions of alignment is provided in the Alignments section at the bottom of the figure. As with all BLAST searches, the Alignments section provides the user with the usual set of scores, the E value, and percentages for identities, positives, and any gaps that may have been introduced.

Snapshot depicts the performance of a BLAST 2 Sequences alignment.

Snapshot depicts the typical output from a BLAST two Sequences alignment in which the standard graphical view is shown at the top of the figure, here indicating two high-scoring segment pairs for the alignment of the sequences for the transcription factor SOX-1 from human and the ctenophore Mnemiopsis leidyi.

      There is also a variation of MegaBLAST called discontiguous MegaBLAST. This version has been designed for comparing divergent sequences from different organisms, sequences where one would expect there to be low sequence identity. This method uses a discontiguous word approach that is quite different from those used by the rest of the programs in the BLAST suite. Here, rather than looking for query words of a certain length to seed the search, non-consecutive positions are examined over longer sequence segments (Ma et al. 2002). The approach has been shown to find statistically significant alignments even when the degree of similarity between sequences is very low.

      The variation of the BLAST algorithm known as PSI-BLAST (for position-specific iterated BLAST) is particularly well suited for identifying distantly related proteins – proteins that may not have been found using the traditional BLASTP method (Altschul et al. 1997; Altschul and Koonin 1998). PSI-BLAST relies on the use of position-specific scoring matrices (PSSMs), which are also often called hidden Markov models or profiles (Schneider et al. 1986; Gribskov et al. 1987; Staden 1988; Tatusov et al. 1994; Bücher et al. 1996). PSSMs are, quite simply, a numerical representation of a multiple sequence alignment, much like the multiple sequence alignments that will be discussed in Chapter 8. Embedded within a multiple sequence alignment is intrinsic sequence information that represents the common characteristics of that particular collection of sequences, frequently a protein family. By using a PSSM, one is able to use these embedded, common characteristics to find similarities between sequences with little or no absolute sequence identity, allowing for the identification and analysis of distantly related proteins. PSSMs are constructed by taking a multiple sequence alignment representing a protein family and then asking a series of questions, as follows.

       What residues are seen at each position of the alignment?

       How often does a particular residue appear at each position of the alignment?

       Are there positions that show absolute conservation?

       Can gaps be introduced anywhere in the alignment?

      The Method