pubmed.ncbi.nlm.nih.gov

Harnessing the power of RADseq for ecological and evolutionary genomics - PubMed

Review

Harnessing the power of RADseq for ecological and evolutionary genomics

Kimberly R Andrews et al. Nat Rev Genet. 2016 Feb.

Abstract

High-throughput techniques based on restriction site-associated DNA sequencing (RADseq) are enabling the low-cost discovery and genotyping of thousands of genetic markers for any species, including non-model organisms, which is revolutionizing ecological, evolutionary and conservation genetics. Technical differences among these methods lead to important considerations for all steps of genomics studies, from the specific scientific questions that can be addressed, and the costs of library preparation and sequencing, to the types of bias and error inherent in the resulting data. In this Review, we provide a comprehensive discussion of RADseq methods to aid researchers in choosing among the many different approaches and avoiding erroneous scientific conclusions from RADseq data, a problem that has plagued other genetic marker types in the past.

PubMed Disclaimer

Figures

Figure 1
Figure 1

Step-by-step illustration of five RADseq library prep protocols. All protocols begin by digesting relatively high quality genomic DNA with one or more restriction enzymes. For most protocols, the sequencing adapters (oligos) are added in two stages, with one set of oligos added during a ligation step early in the protocol, and a second set of oligos incorporated during a final PCR step. The second set of oligos extends the length of the total fragment to produce the entire Illumina adapter sequences. In contrast, the original RADseq adds adapters in three stages. For Illumina sequencing, the adapters on either end of each DNA fragment must differ, and therefore some protocols (e.g. original RADseq, ddRAD, ezRAD) use “Y-adapters” that are structured to ensure that only fragments with different adapters on either end are PCR-amplified (illustrated here as Y-shaped adapters). Other protocols (e.g. GBS) simply rely on the fact that fragments without the correct adaptors will not be sequenced. To generate fragments of an ideal length for sequencing, most methods use common-cutter enzymes (e.g. 4–6bp cutters) to generate a wide range of fragment sizes, followed by a direct size selection (gel-cutting or magnetic beads, e.g. ddRAD, ezRAD) or an indirect size selection (as a consequence of PCR amplification or sequencing efficiency, e.g. GBS). In contrast, the original RADseq uses a mechanical shearing step to produce fragments of an appropriate size, and incorporates a size selection step only to increase Illumina sequencing efficiency and remove adapter dimers. 2bRAD uses IIB restriction enzymes to produce small fragments of equal size across all loci (33–36bp).

Figure 2
Figure 2

Sources of error and bias in RADseq data. (a) Example of allele dropout for a RADseq protocol that uses size selection to reduce the number of loci to be sequenced. Gray lines represent chromosomes within one individual, red squares represent restriction cut sites, colored squares represent heterozygous SNPs, and brackets represent genomic regions that are sequenced. Mutation in Restriction Cut Site B for Haplotype 1 makes the post-digestion fragment containing the SNP too long to be retained during size selection for Haplotype 1, eliminating the possibility of sequencing of any loci on that fragment, and causing the individual to appear homozygous at the heterozygous SNP. (b) See Figure 1 of Andrews et al. 2014. Example of fragments produced after PCR for one heterozygous locus for different RADseq protocols, and the reads retained after bioinformatic analyses. PCR duplicates are shown with the same symbol (circle, square, asterisk or triangle) as the parent fragment from the original template DNA. By chance, some alleles will amplify more than others during PCR. For all protocols, PCR duplicates will be identical in sequence composition and length to the original template molecule. For the original RADseq, this feature (i.e., identical length) can be used to identify and remove PCR duplicates bioinformatically, because original template molecules for a given locus will not be identical in length. For alternative RADseq methods, this feature cannot be used to identify PCR duplicates, because all original template molecules for a given locus are identical in length. High frequencies of PCR duplicates can cause heterozygotes to appear as homozygotes or can cause PCR errors to appear as true diversity.

Figure within BOX 1
Figure within BOX 1

Numbers of articles citing the original papers describing each RADseq protocol over time. Data for 2015 are extrapolated using numbers of articles cited from January through September 2015. Protocols are arranged by order of first appearance in the literature. Data generated using Web of Science.

Similar articles

Cited by

References

    1. Science. Breakthrough of the Year. Areas to Watch. Science 330. 2010;6011:1608–1609. [DOI: 1610.1126/science.1330.6011.1608-c] - PubMed
    1. Davey JW, et al. Genome-wide genetic marker discovery and genotyping using next-generation sequencing. Nature Reviews Genetics. 2011;12:499–510. doi: 10.1038/nrg3012. Reviews methods for genomic marker discovery and genotyping using next-generation sequencing methods. - DOI - PubMed
    1. Luikart G, England PR, Tallmon D, Jordan S, Taberlet P. The power and promise of population genomics: From genotyping to genome typing. Nature Reviews Genetics. 2003;4:981–994. doi: 10.1038/nrg1226. - DOI - PubMed
    1. Hedges SB, Schweitzer MH. Detecting dinosaur DNA. Science. 1995;268:1191–1192. - PubMed
    1. Pérez T, Albornoz J, Domínguez A. An evaluation of RAPD fragment reproducibility and nature. Molecular Ecology. 1998;7:1347–1357. doi: 10.1046/j.1365-294x.1998.00484.x. - DOI - PubMed

Publication types

MeSH terms