3.3.5 Pigeon pea
Pigeon pea is an important tropical grain legume fulfilling the nutritional security in most of the developing countries such as Africa, Asia, and tropical America. The completion of the draft genome using high‐throughput techniques facilitated rapid detection of genomic variation among pigeon pea landraces, breeding lines, and wild species. Varshney and his coworkers generated the whole‐genome resequencing data for 292 accessions that include breeding lines (117 accessions), landraces (166), 7 from wild species, and 2 others. They had generated 21.7 billion paired‐end reads of 2.19 Tb sequence (101 bp in length) and mapped to the reference genome (Asha pigeon pea cultivar). These mapping results can be used for the identification of SNPs, INDELs, and larger‐scale variants along with CNVs and PAVs, such findings can accelerate the sustainability and improve the yield of the pigeon pea (Varshney et al. 2017).
3.3.6 Vitis
Vitis is native to temperate regions and having 30 wild species indigenous to East Asia and 28 wild species from North America. Liang and coworkers resequenced the whole genome of 472 Vitis accessions that cover 48 out of the 60 extant species from different geographical distributions. They had generated around 4.1 Tb WGS data, 27.3 billion paired‐end raw reads for 472 accessions. For the majority of accessions, the coverage was found more than 80%. By using different filter criteria, they identified the 77 726 929 SNPs, 10 278 017 indels, and 23K copy number of the variants. Next filtering yielded 27 859 960 SNPs, 3 854 659 indels with minor allele frequency (MAF) >0.005 and core set of about 12 549 273 SNPs and 9 lakh indels with MAF >0.005. Around 73.7% of the SNPs were present in the intergenic regions and around 4% of the SNPs were present in the coding region. All these outcomes from this study can play an important role in the improvement of the Vitis cultivars (Liang et al. 2019).
3.4 Whole‐Genome Pooled Sequencing
Genotyping populations with NGS technologies are very convenient and appealing (Unamba et al. 2015). Whole‐genome pooled sequencing popularly known as Pool‐seq has been used to sequence multiple individuals instead of single individuals from a population separately (Fracassetti et al. 2015). The main advantage is that large number of samples can be assayed in a cost‐effective manner. But sometimes, pooling is necessary when separating individuals is difficult and when sufficient DNA is not there to create individual libraries. Pooling is used for the estimation of SNP frequencies in population genetics. Pooling approach has been applied to study genetic loci governing a trait, to perform GWAS (Bastide et al. 2013), and to gather demographic history of population (Hellwege et al. 2017). During statistical analysis, random experimental errors in measuring allele frequencies from pooled DNA should be considered. Sophisticated pooling designs are being developed in order to take care of hidden population stratification, confounders, and interactions, allowing the analysis of haplotypes (Sham et al. 2002; Norton et al. 2004). Fracassetti et al. (2015) verified a pooled whole‐genome resequencing (Pool‐seq) technique on Arabidopsis lyrata in which only 1.6 ng DNA per individual was used for library preparation. They compared the SNP frequencies, acquired by pooling to those generated by individual‐based genotyping‐by‐sequencing (GBS), and found, Pool‐seq as a viable approach for obtaining population‐level SNP frequency data with good accuracy in a larger number of samples. They also examined the effects of sample size, sequencing depth per individual, and variant caller on population SNP frequency estimations. Since differential amplification occurs for many SNPs, and this biasness should be rectified during the estimation of allele frequency from pooled DNA (Sham et al. 2002; Norton et al. 2004). Whole‐genome sequencing of pooled F2 individuals has been used to identify mapping interval and candidate gene SNPs in maize (Klein et al. 2018).
NGS technique in combination with whole‐genome shotgun sequencing can be used to genotype nonmutant Vs. mutant bulked segregant analysis (BSA) pools at markers across the genome (Klein et al. 2018). With the advent of NGS, even without precise mapping, BSA may be utilized to immediately identify a very narrow mapping interval, causative lesion, and so forth. The genomic interval containing the mutant gene of interest can be discovered using BSA‐Seq data (Hill et al. 2013). In a pooled BSA sample, the unlinked marker will be genotypically heterozygous, unless F1 recombination has occurred between the mutant lesion and a specific marker, while the linked markers will be homozygous for the mutant parent genotype. BSA‐Seq is a useful approach for genome reduction that allows to increase sequencing depth without raising costs.
3.5 Pinpointing Gene Through Whole‐Genome Resequencing‐based QTL Mapping
The NGS has been used for QTL mapping and pinpointing the genes of interest. QTL mapping approaches solve the issues caused by repetitive sequences and genome duplication in addition to combining SNP discovery, SNP validation, and genotyping (Xu et al. 2013). Huang et al. (2009) showed that QTL affecting plant height were linked to a region where the green revolution gene is situated (Huang et al. 2009; Xu et al. 2013). Rice recombinant inbred lines (RIL) populations were genotyped with the help of WGR (Huang et al. 2009). In rice, QTL mapping precision for grain width was evaluated from previously cloned QTL possessing the gene GW5 for grain width found to be localized in bin, its presumed genomic location (Xie et al. 2010). The complexity of the genome of some species for example soybean (paleopolyploid) hinders the applications of sequencing‐based genotyping approaches. Since, due to two genome duplication events, 75% of the genes are present in multiple copies (Schmutz et al. 2010). These paralogs further pose issues due to the short sequence reads which cause undistinguished allelic variations. Besides, the abundant repetitive sequences, heterochromatic regions of soybean genome, provide technological challenges for sequence alignment. Using 246 RILs sequenced at an average depth of 0.19× (Xu et al. 2013). Using 246 RILs sequenced at an average depth of 0.19, Xu et al. (2013) employed NGS methods to find QTL for southern root‐knot nematode (RKN) resistance in soybean. They identified and validated SNPs. RIL population was genotyped as well as the parental source of each SNP allele was deduced. Linkage map was subsequently established using 3509 bins and 3489 recombination intervals as molecular markers. Out of three QTLs identified, one major QTL was 29.7 kb in size and mapped to bin 10 of chromosome 10. This QTL possess three true and two pseudogenes. As a result of sequence differences and gene expression analyses, the RKN resistance candidate genes Glyma10g02160 and Glyma10g02150 were identified which encodes for pectin methylesterase inhibitor‐pectin methylesterase and pectin methylesterase inhibitor, respectively. This method is widely used to enhance QTL mapping accuracy in crops with a reference genome.
3.6 Online Resources for Whole‐Genome Resequencing Data
3.6.1 SNP Seek
SNP seek database is a user‐friendly database that provides access to SNPs. Rice SNP seek database (www.snp‐seek.irri.org) contains SNP genotyping data for 3K rice varieties obtained from 3000 Rice Genomes Projects (Locedie et al. 2016). About 20 million rice SNPs were identified from this Project for 3K rice variety sets against the reference Nipponbare genome (Alexandrov et al. 2015). We can use the interface to identify SNPs using gene ID (MSU locus name) or genomic region by specifying chromosome, start and end sites. This site also provides phenotype and variety information for 3K rice varieties from the International Rice Genebank Collection Information System (IRGCIS) (Locedie et al. 2016).
3.6.2 Rice Functional and Genomic Breeding
Rice Functional and Genomic Breeding (RFGB, www.rmbreeding.cn) is a user‐friendly database developed for breeding applications based on SNPs and InDels from 3000 rice genome projects. RFGB database bridges gap between phenotypic and genotypic data sets of the sequenced genome (Chun‐Chao