pubmed.ncbi.nlm.nih.gov

A Bayesian framework for de novo mutation calling in parents-offspring trios - PubMed

  • ️Thu Jan 01 2015

A Bayesian framework for de novo mutation calling in parents-offspring trios

Qiang Wei et al. Bioinformatics. 2015.

Abstract

Motivation: Spontaneous (de novo) mutations play an important role in the disease etiology of a range of complex diseases. Identifying de novo mutations (DNMs) in sporadic cases provides an effective strategy to find genes or genomic regions implicated in the genetics of disease. High-throughput next-generation sequencing enables genome- or exome-wide detection of DNMs by sequencing parents-proband trios. It is challenging to sift true mutations through massive amount of noise due to sequencing error and alignment artifacts. One of the critical limitations of existing methods is that for all genomic regions the same pre-specified mutation rate is assumed, which has a significant impact on the DNM calling accuracy.

Results: In this study, we developed and implemented a novel Bayesian framework for DNM calling in trios (TrioDeNovo), which overcomes these limitations by disentangling prior mutation rates from evaluation of the likelihood of the data so that flexible priors can be adjusted post-hoc at different genomic sites. Through extensively simulations and application to real data we showed that this new method has improved sensitivity and specificity over existing methods, and provides a flexible framework to further improve the efficiency by incorporating proper priors. The accuracy is further improved using effective filtering based on sequence alignment characteristics.

Availability and implementation: The C++ source code implementing TrioDeNovo is freely available at https://medschool.vanderbilt.edu/cgg.

Contact: bingshan.li@vanderbilt.edu

Supplementary information: Supplementary data are available at Bioinformatics online.

© The Author 2014. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

PubMed Disclaimer

Figures

Fig. 1.
Fig. 1.

ROC curves of de novo SNV mutations called by TrioDeNovo in simulated datasets with different coverage. Sensitivity and false positive rates were calculated for sequencing coverage of 17× (black), 34× (green), 51× (blue) and 68× (red) with flat prior odds for all candidates (color version of this figure is available at Bioinformatics online.)

Fig. 2.
Fig. 2.

Comparison of ROC curves of de novo SNV mutations called by TrioDeNovo and DeNovoGear in the simulated datasets with coverage of 17X (A, D), 51X (B, E) and 68X (C, F). (A–C) The ROC curves calculated based on data simulated with the same mutation rate, and (D-F) the corresponding ROC curves with different prior mutation rates. Black lines represent TrioDeNovo calls with appropriate prior odds. Green, orange and red lines represent DeNovoGear calls with specified mutation rates of 10−8 (default), 10−4 and 10−12, respectively (color version of this figure is available at Bioinformatics online.)

Fig. 3.
Fig. 3.

ROC curves of de novo germline SNV mutations called by TrioDeNovo and DeNovoGear in the 1000GP CEU trio data with different coverage. ROC curves were calculated on datasets with 25% (A), 75% (B) and 100% (C) of the original whole genome data. Black lines represent TrioDeNovo calls with flat prior odds. Green, orange and red lines represent DeNovoGear calls with specified mutation rates of 10−8 (default), 10−4 and 10−12, respectively (color version of this figure is available at Bioinformatics online.)

Fig. 4.
Fig. 4.

ROC curves of de novo germline SNV mutations called by TrioDeNovo and DeNovoGear in the 1000G CEU trio filtered using DNMfilter. Blue lines represent the ROC curves of the TrioDeNovo calls after applying DNMFilter and other lines are the same as those in Figure 3 (color version of this figure is available at Bioinformatics online.)

Similar articles

Cited by

References

    1. Campbell C.D., et al. . (2012) Estimating the human mutation rate using autozygosity in a founder population. Nat. Genet. , 44, 1277–1281. - PMC - PubMed
    1. Conrad D.F., et al. . (2011) Variation in genome-wide mutation rates within and between human families. Nat. Genet. , 43, 712–714. - PMC - PubMed
    1. Danecek P., et al. . (2011) The variant call format and VCFtools. Bioinformatics , 27, 2156–2158. - PMC - PubMed
    1. DePristo M.A., et al. . (2011) A framework for variation discovery and genotyping using next-generation DNA sequencing data. Nat. Genet. , 43, 491–498. - PMC - PubMed
    1. Fromer M., et al. . (2014) De novo mutations in schizophrenia implicate synaptic networks. Nature , 506, 179–184. - PMC - PubMed

Publication types

MeSH terms