ANGSD: Analysis of Next Generation Sequencing Data - PubMed
- ️Wed Jan 01 2014
ANGSD: Analysis of Next Generation Sequencing Data
Thorfinn Sand Korneliussen et al. BMC Bioinformatics. 2014.
Abstract
Background: High-throughput DNA sequencing technologies are generating vast amounts of data. Fast, flexible and memory efficient implementations are needed in order to facilitate analyses of thousands of samples simultaneously.
Results: We present a multithreaded program suite called ANGSD. This program can calculate various summary statistics, and perform association mapping and population genetic analyses utilizing the full information in next generation sequencing data by working directly on the raw sequencing data or by using genotype likelihoods.
Conclusions: The open source c/c++ program ANGSD is available at http://www.popgen.dk/angsd . The program is tested and validated on GNU/Linux systems. The program facilitates multiple input formats including BAM and imputed beagle genotype probability files. The program allow the user to choose between combinations of existing methods and can perform analysis that is not implemented elsewhere.
Figures

Data formats and call graph. A) Dependency of different data formats and analyses that can be performed in ANGSD. B) Simplified call graph. Red nodes indicate areas that are not threaded. With the exception of file readers, all analyses, printing and cleaning is done by objects derived from the abstract base class called general.

1D SFS for different GL models. SFS estimation based on a 170 megabase region from chromosome 1 using 12 CEU samples A) and 14 YRI samples B)” from the 1000 genomes project. The analysis was performed for both the GATK GL model (green, light brown) and SAMtools GL (yellow,dark brown). Notice the difference in estimated variability (proportion of variable sites) for the two GL models, with GATK GL based analyses inferring more variable sites and an associated larger proportion of low-frequency alleles. The two categories of invariable sites have been removed and the distributions have been normalized so that the frequencies of all categories sum to one for each method.

Joint SFS (2D-SFS). Two dimensional SFS estimation based on a 170 megabase region from chromosome 1 using 12 CEU samples and 14 YRI samples from the 1000 genomes project.

Overlap between inferred SNPs with a critical p-value threshold of 10 −6 and not using BAQ. Venn diagram of the overlap between the SNP discovery for ANGSD, GATK and SAMtools for 33 CEU samples for chromosome 1. We used default parameters with GATK for SAMtools we discarded reads with a mapping quality below 10. For ANGSD we choose an p-value threshold of 10−6 and didn’t enable BAQ. In A, we used the SAMtools genotype likelihood model in ANGSD, in B we used the GATK model in ANGSD.

Error rate vs call rate for called genotypes. Error rate and call rates for genotype calls based on different methods. The error rate is defined as the discordance rate between HapMap genotype calls compared to the same individuals sequenced in the 1000 genomes. Genotype where called for all sites for all individuals for all methods. Each genotype call has a score which was used to determine the call rate. Due to the discrete nature of some of the genotype scores we obtain a jagged curve.
Similar articles
-
angsd-wrapper: utilities for analysing next-generation sequencing data.
Durvasula A, Hoffman PJ, Kent TV, Liu C, Kono TJ, Morrell PL, Ross-Ibarra J. Durvasula A, et al. Mol Ecol Resour. 2016 Nov;16(6):1449-1454. doi: 10.1111/1755-0998.12578. Epub 2016 Aug 29. Mol Ecol Resour. 2016. PMID: 27480660
-
Hanghøj K, Moltke I, Andersen PA, Manica A, Korneliussen TS. Hanghøj K, et al. Gigascience. 2019 May 1;8(5):giz034. doi: 10.1093/gigascience/giz034. Gigascience. 2019. PMID: 31042285 Free PMC article.
-
fastNGSadmix: admixture proportions and principal component analysis of a single NGS sample.
Jørsboe E, Hanghøj K, Albrechtsen A. Jørsboe E, et al. Bioinformatics. 2017 Oct 1;33(19):3148-3150. doi: 10.1093/bioinformatics/btx474. Bioinformatics. 2017. PMID: 28957500
-
Estimating individual admixture proportions from next generation sequencing data.
Skotte L, Korneliussen TS, Albrechtsen A. Skotte L, et al. Genetics. 2013 Nov;195(3):693-702. doi: 10.1534/genetics.113.154138. Epub 2013 Sep 11. Genetics. 2013. PMID: 24026093 Free PMC article.
-
NgsRelate: a software tool for estimating pairwise relatedness from next-generation sequencing data.
Korneliussen TS, Moltke I. Korneliussen TS, et al. Bioinformatics. 2015 Dec 15;31(24):4009-11. doi: 10.1093/bioinformatics/btv509. Epub 2015 Aug 30. Bioinformatics. 2015. PMID: 26323718 Free PMC article.
Cited by
-
Genomic stability through time despite decades of exploitation in cod on both sides of the Atlantic.
Pinsky ML, Eikeset AM, Helmerson C, Bradbury IR, Bentzen P, Morris C, Gondek-Wyrozemska AT, Baalsrud HT, Brieuc MSO, Kjesbu OS, Godiksen JA, Barth JMI, Matschiner M, Stenseth NC, Jakobsen KS, Jentoft S, Star B. Pinsky ML, et al. Proc Natl Acad Sci U S A. 2021 Apr 13;118(15):e2025453118. doi: 10.1073/pnas.2025453118. Proc Natl Acad Sci U S A. 2021. PMID: 33827928 Free PMC article.
-
Wimalarathna NA, Wickramasuriya AM, Metschina D, Cauz-Santos LA, Bandupriya D, Ariyawansa KGSU, Gopallawa B, Chase MW, Samuel R, Silva TD. Wimalarathna NA, et al. PLoS One. 2024 Jun 26;19(6):e0305990. doi: 10.1371/journal.pone.0305990. eCollection 2024. PLoS One. 2024. PMID: 38924027 Free PMC article.
-
Ancient Mitochondrial Genomes Provide New Clues in the History of the Akhal-Teke Horse in China.
Zhu S, Zhang N, Zhang J, Shao X, Guo Y, Cai D. Zhu S, et al. Genes (Basel). 2024 Jun 15;15(6):790. doi: 10.3390/genes15060790. Genes (Basel). 2024. PMID: 38927726 Free PMC article.
-
Mating system is correlated with immunogenetic diversity in sympatric species of Peromyscine mice.
Meléndez-Rosa J, Bi K, Lacey EA. Meléndez-Rosa J, et al. PLoS One. 2020 Jul 23;15(7):e0236084. doi: 10.1371/journal.pone.0236084. eCollection 2020. PLoS One. 2020. PMID: 32701975 Free PMC article.
-
Elevated genetic risk for multiple sclerosis emerged in steppe pastoralist populations.
Barrie W, Yang Y, Irving-Pease EK, Attfield KE, Scorrano G, Jensen LT, Armen AP, Dimopoulos EA, Stern A, Refoyo-Martinez A, Pearson A, Ramsøe A, Gaunitz C, Demeter F, Jørkov MLS, Møller SB, Springborg B, Klassen L, Hyldgård IM, Wickmann N, Vinner L, Korneliussen TS, Allentoft ME, Sikora M, Kristiansen K, Rodriguez S, Nielsen R, Iversen AKN, Lawson DJ, Fugger L, Willerslev E. Barrie W, et al. Nature. 2024 Jan;625(7994):321-328. doi: 10.1038/s41586-023-06618-z. Epub 2024 Jan 10. Nature. 2024. PMID: 38200296 Free PMC article.
References
Publication types
MeSH terms
LinkOut - more resources
Full Text Sources
Other Literature Sources
Molecular Biology Databases
Miscellaneous