1 ICAR – National Institute for Biotechnology, New Delhi, India
2 ICAR – Indian Institute of Pulses Research, Kanpur, Uttar Pradesh, India
3 ICAR – Indian Institute of Agricultural Biotechnology, Ranchi, Jharkand, India
4 Xcelris Lab Pvt Ltd., Ahmedabad, Gujarat, India
2.1 Introduction
Single‐nucleotide polymorphisms (SNPs) are the most common genetic variations present in the genome. These are the genetic loci where variation exists for a single nucleotide. Even for variation to be called as SNPs, it should be present in more than 1% of the individuals within a population. SNPs may be present as change or InDel of a single base which has originated mainly due to mutation or replication error. SNPs may fall within coding, noncoding as well as intergenic regions. Due to degeneracy of the genetic code, it is not necessary that SNPs within a coding region will change the amino acid sequence of the protein. A synonymous (often silent mutation) SNP is one in which both types of alleles result in the same polypeptide sequence, whereas nonsynonymous SNPs create a distinct polypeptide sequence. A nonsynonymous change further may be missense or nonsense. The missense mutation results in a different amino acid, while a nonsense mutation will result in a premature stop codon, ultimately forming a truncated peptide. SNPs that are not in protein‐coding regions may still affect the gene splicing, transcription factor binding resulting in a change in expression dynamics of the gene. As SNPs can occur throughout the genome, they offer a good advantage of genome‐wide coverage and higher frequency which assist high‐resolution mapping compared to other marker systems. Automated genotyping techniques, high reproducibility, easy cost of genotyping, presence within a genic region, and gel‐free methodology are some advantages of SNP which other markers lack. Besides several advantages, SNPs also offer certain disadvantages, such as detection of SNPs mostly requires sequencing of the DNA, high‐throughput analysis is required in postsequencing steps, also the SNP‐based genotyping experiments are always associated with high equipment cost.
In the last couple of years, the detection and exploitation of DNA polymorphisms with molecular markers and their utilization in breeding programs have transformed the pace and accuracy of these experiments. Ultimately, a complete shift from phenotype‐based breeding to molecular breeding has been evident. With the advancement of NGS technology, there is a rapid rise in the application of SNP‐based genotyping. The popularity of these programs depends on the cost incurred as well as the ease of their application. Also, these SNP‐based genotyping platforms offer many vital advantages over other marker systems like reproducibility, high marker density, automation, and many more. Many modern breeding techniques, such as genome‐wide association studies (GWAS) and genomic selection (GS), necessitate a large number of markers, which is difficult to achieve using PCR‐based markers like SSR.
2.2 SNP Genotyping Platforms
Accurate detection of SNP is the first and foremost step for its mainstream application in breeding experiments. Besides the sequencing techniques used, reads length, coverage, and robust assembly approach are the control points for sequencing errors. Sequencing technologies with long reads length (in kb’s) like PacBio single‐molecule real‐time (SMRT) sequencing (Eid et al. 2009), Illumina TruSeq synthetic long‐read technology (McCoy et al. 2014), and Oxford nanopore sequencing (Branton et al. 2010) are available to develop the complete and correct reference genome assembly of any organism, which sort out many of the problems in SNP discovery. Not only in genome sequencing, these long‐read methodologies have been successfully deployed for SNP mining and provide a more precise understanding of the complexity of isoforms for organisms lacking a reference (Piriyapongsa et al. 2018). Once a reference genome is available, the application of SNPs in its breeding program becomes comparatively easier. A concise layout for SNP‐based breeding experiment is provided (Figure 2.1).
2.2.1 SNP Genotyping Versus SNP Discovery
First, a diverse set of genotypes are selected based on phenotype for the creation of SNP panel with the concept that as these genotypes are hypervariable in several phenotypes, the same should reflect from their sequences. The reference genome sequence is used for reference‐based assembly of the reads generated from sequencing of these genotypes (S1–S10 in Figure 2.1). A set of hypervariable loci are filtered through some statistical analysis. Once discovered, these can be genotyped in any other accession, which is known as SNP genotyping, as genotyping is the process of knowing the allelic status of an organism at a particular locus. The process till the identification of hypervariable loci is known as SNP discovery. The discovery of SNPs in crops prompted the development of the simple and low‐cost genotyping platforms. A flowchart related to different SNP genotyping technologies has been provided (Figure 2.2).
Figure 2.1 A pipeline for SNP discovery (S1–S10 are different diverse accession of the same species to which the reference belongs).
2.2.2 Types of SNP Genotyping Platforms
Although multiple classification systems are based on the reaction, fluorescence used, PCR usage, etc., but broadly all SNP genotyping methods can be categorized into two groups based on the detection mechanism, i.e. Allelic discrimination and Allelic detection. Both of these groups can be further classified based on the reaction chemistry and other variables.
2.2.2.1 Allelic Discrimination
Allelic discrimination depends on allele‐specific biochemical reactions where the alternate alleles are discriminated based on their extension (primer extension methods), hybridization (array for both alleles), and differential enzymatic cleavage pattern. A broader classification for different allelic discrimination method is mentioned below:
2.2.2.1.1 PCR‐Free Genotyping Technology
Among the conventional molecular markers, RFLP is based on the detection of mutation at the restriction site which is usually SNP. Except for RFLP, which relies on restriction digestion followed by detection of digested DNA via southern blotting, other majority of the SNP genotyping methods require amplification of the SNP containing genomic region prior to polymerase chain reaction (PCR). This preamplification step is inevitable from most of the genotyping techniques, despite the fact that the PCR process is relatively expensive.
Invader Assay
Several PCR‐free genotyping methods have been developed to date, one of such methods is invader assay (Lyamichev et al. 1999). Invader assay is based on nucleotide‐specific cleavage by a structure‐specific “flap” endonuclease, in the presence of an invading oligonucleotide. This reaction is followed by a subsequent secondary reaction that generates allele‐specific signals using fluorescence resonance energy transfer (FRET) oligonucleotide cassettes. Both of the reactions in invader assay are a “single vessel reaction” as well as isothermal reaction, hence are easily automatable. Although the whole process is highly accurate and automatable, the need of a large amount of DNA is one of its major drawbacks which can be omitted by coupling the reaction with PCR. Many other PCR‐free methodologies have been proposed like padlock probe ligation (Nilsson et al. 1994) the rolling circle DNA amplification (RCA) process (Baner et al. 1998). Although these gel‐free methods can be automated, and are highly accurate, their application is limited to only a small number of SNPs, hence genotyping of large numbers of SNPs