Assessing DNA barcoding as a tool for species identification and data quality control - PubMed
- ️Invalid Date
Assessing DNA barcoding as a tool for species identification and data quality control
Yong-Yi Shen et al. PLoS One. 2013.
Abstract
In recent years, the number of sequences of diverse species submitted to GenBank has grown explosively and not infrequently the data contain errors. This problem is extensively recognized but not for invalid or incorrectly identified species, sample mixed-up, and contamination. DNA barcoding is a powerful tool for identifying and confirming species and one very important application involves forensics. In this study, we use DNA barcoding to detect erroneous sequences in GenBank by evaluating deep intraspecific and shallow interspecific divergences to discover possible taxonomic problems and other sources of error. We use the mitochondrial DNA gene encoding cytochrome b (Cytb) from turtles to test the utility of barcoding for pinpointing potential errors. This gene is widely used in phylogenetic studies of the speciose group. Intraspecific variation is usually less than 2.0% and in most cases it is less than 1.0%. In comparison, most species differ by more than 10.0% in our dataset. Overlapping intra- and interspecific percentages of variation mainly involve problematic identifications of species and outdated taxonomies. Further, we detect identical problems in Cytb from Insectivora and Chiroptera. Upon applying this strategy to 47,524 mammalian CoxI sequences, we resolve a suite of potentially problematic sequences. Our study reveals that erroneous sequences are not rare in GenBank and that the DNA barcoding can serve to confirm sequencing accuracy and discover problems such as misidentified species, inaccurate taxonomies, contamination, and potential errors in sequencing.
Conflict of interest statement
Competing Interests: The authors have declared that no competing interests exist.
Figures

All three codon positions are used.

Majority of intraspecific divergences are less than 5.0% (A); majority of interspecific divergences exceed 8.0% (B).
Similar articles
-
Detection of Potential Problematic Cytb Gene Sequences of Fishes in GenBank.
Li X, Shen X, Chen X, Xiang D, Murphy RW, Shen Y. Li X, et al. Front Genet. 2018 Feb 6;9:30. doi: 10.3389/fgene.2018.00030. eCollection 2018. Front Genet. 2018. PMID: 29467794 Free PMC article.
-
Panthum T, Ariyaphong N, Wattanadilokchatkun P, Singchat W, Ahmad SF, Kraichak E, Dokkaew S, Muangmai N, Han K, Duengkae P, Srikulnath K. Panthum T, et al. Genes Genomics. 2023 Feb;45(2):169-181. doi: 10.1007/s13258-022-01353-7. Epub 2022 Dec 13. Genes Genomics. 2023. PMID: 36512198
-
Laopichienpong N, Muangmai N, Supikamolseni A, Twilprawat P, Chanhome L, Suntrarachun S, Peyachoknagul S, Srikulnath K. Laopichienpong N, et al. Gene. 2016 Dec 15;594(2):238-247. doi: 10.1016/j.gene.2016.09.017. Epub 2016 Sep 12. Gene. 2016. PMID: 27632899
-
Jaito W, Sonongbua J, Panthum T, Wattanadilokcahtkun P, Ariyaraphong N, Thong T, Singchat W, Ahmad SF, Kraichak E, Muangmai N, Han K, Antunes A, Sitdhibutr R, Koga A, Duengkae P, Kasorndorkbua C, Srikulnath K. Jaito W, et al. Genes Genomics. 2024 Jan;46(1):95-112. doi: 10.1007/s13258-023-01462-x. Epub 2023 Nov 20. Genes Genomics. 2024. PMID: 37985545
-
Chen R, Jiang LY, Qiao GX. Chen R, et al. PLoS One. 2012;7(10):e46190. doi: 10.1371/journal.pone.0046190. Epub 2012 Oct 3. PLoS One. 2012. PMID: 23056258 Free PMC article.
Cited by
-
Molecular and MALDI-TOF identification of ticks and tick-associated bacteria in Mali.
Diarra AZ, Almeras L, Laroche M, Berenger JM, Koné AK, Bocoum Z, Dabo A, Doumbo O, Raoult D, Parola P. Diarra AZ, et al. PLoS Negl Trop Dis. 2017 Jul 24;11(7):e0005762. doi: 10.1371/journal.pntd.0005762. eCollection 2017 Jul. PLoS Negl Trop Dis. 2017. PMID: 28742123 Free PMC article.
-
Lalitha R, Chandavar VR. Lalitha R, et al. J Adv Res. 2017 Nov 21;9:87-95. doi: 10.1016/j.jare.2017.10.007. eCollection 2018 Jan. J Adv Res. 2017. PMID: 30046490 Free PMC article.
-
Zamora JC, Calonge FD, Martín MP. Zamora JC, et al. Persoonia. 2015 Jun;34:130-65. doi: 10.3767/003158515X687443. Epub 2015 Feb 3. Persoonia. 2015. PMID: 26240450 Free PMC article.
-
Quail MA, Smith M, Jackson D, Leonard S, Skelly T, Swerdlow HP, Gu Y, Ellis P. Quail MA, et al. BMC Genomics. 2014 Feb 7;15(1):110. doi: 10.1186/1471-2164-15-110. BMC Genomics. 2014. PMID: 24507442 Free PMC article.
-
Coeur d'Acier A, Cruaud A, Artige E, Genson G, Clamens AL, Pierre E, Hudaverdian S, Simon JC, Jousselin E, Rasplus JY. Coeur d'Acier A, et al. PLoS One. 2014 Jun 4;9(6):e97620. doi: 10.1371/journal.pone.0097620. eCollection 2014. PLoS One. 2014. PMID: 24896814 Free PMC article.
References
-
- Forster P (2003) To err is human. Ann Hum Genet 67: 2–4. - PubMed
-
- Yao YG, Bravi CM, Bandelt HJ (2004) A call for mtDNA data quality control in forensic science. Forensic Sci Int 141: 1–6. - PubMed
-
- Pompanon F, Bonin A, Bellemain E, Taberlet P (2005) Genotyping errors: causes, consequences and solutions. Nat Rev Genet 6: 847–859. - PubMed
Publication types
MeSH terms
Substances
Grants and funding
The authors thank National Natural Science Foundation of China (31172080), Key Program of West Light Foundation (2011), Youth Innovation Promotion Association of the Chinese Academy of Sciences, and Engineering Research Council (Discovery Grant 3148) for their support. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
LinkOut - more resources
Full Text Sources
Other Literature Sources