web.archive.org

Punctuated bursts in human male demography inferred from 1,244 worldwide Y-chromosome sequences : Nature Genetics : Nature Research

️Chris Tyler-Smith
️Mon Apr 25 2016

Nature Genetics | Analysis

Journal name:

Nature Genetics

Volume:

48,

Pages:

593–599

Year published:

(2016)

DOI:

doi:10.1038/ng.3559

Received

08 November 2015

Accepted

01 April 2016

Published online

25 April 2016

Abstract

We report the sequences of 1,244 human Y chromosomes randomly ascertained from 26 worldwide populations by the 1000 Genomes Project. We discovered more than 65,000 variants, including single-nucleotide variants, multiple-nucleotide variants, insertions and deletions, short tandem repeats, and copy number variants. Of these, copy number variants contribute the greatest predicted functional impact. We constructed a calibrated phylogenetic tree on the basis of binary single-nucleotide variants and projected the more complex variants onto it, estimating the number of mutations for each class. Our phylogeny shows bursts of extreme expansion in male numbers that have occurred independently among each of the five continental superpopulations examined, at times of known migrations and technological innovations.

View full text

At a glance

Figures

Figure 1: Discovery and validation of a representative Y-chromosome CNV.
(a) The GRCh37 reference sequence contains an inverted segmental duplication (yellow bars) within GRCh37 Y: 17,986,738–18,016,824 bp. We designed FISH probes to target the 3′ termini of the two segments (magenta and green bars labeled P1 and P3, respectively) and the unique region between them (light-blue bar labeled P2). A fourth probe used reference sequence BAC clone RP11-12J24 (dark-blue bar labeled P4). Unlabeled green and magenta bars represent expected cross-hybridization, and black bars represent CNV events called by GenomeSTRiP and aCGH. GenomeSTRiP called a 30-kb deletion that includes the duplicated segments and the unique spacer region, whereas aCGH lacks probes in the duplicated regions. (b) GenomeSTRiP discovery plot. The red curve indicates the normalized read depth for sample HG00183, as compared to the read depth for 1,232 other samples (gray) and the median depth (black). (c) Validation by aCGH. The intensity ratio for HG00183 (red) is shown relative to that for 1,233 other samples (gray) and the median ratio (black). (d) Fiber-FISH validation using the probes illustrated in a. The reference sample, HG00096, matches the human reference sequence, with green, magenta, light-blue, magenta, and green hybridizations occurring in sequence. In contrast, we observed just one green and one magenta hybridization in HG00183, indicating deletion of one copy of the segmental duplication and the central unique region. The coordinate scale that is consistent across a–c does not apply to d, and, although the BAC clone hybridization (dark blue) is shorter in the sample with the deletion, it appears longer owing to the variable degree of stretching inherent to the molecular combing process.
Figure 2: Y-chromosome phylogeny and haplogroup distribution.
Branch lengths are drawn proportional to the estimated times between successive splits, with the most ancient division occurring ~190 kya. Colored triangles represent the major clades, and the width of each base is proportional to one less than the corresponding sample size. We modeled expansions within eight of the major haplogroups (circled) (Fig. 4); dotted triangles represent the ages and sample sizes of the expanding lineages. Inset, world map indicating, for each of the 26 populations, the geographic source, sample size, and haplogroup distribution.
Figure 3: Mutation events.
(a) Bar plots show the percentage of each variant type stratum associated with 1, 2, 3–10, or more mutations across the phylogeny. (b) For STRs, scatterplots show the logarithm of the number of mutational events versus major allele length, stratified by motif length and the number of interruptions to the repeat structure. We have plotted regression lines with shaded confidence intervals for categories with at least ten data points, and we have omitted from the plots 44 STRs with motif lengths greater than 4 bp and 91 STRs whose mutation rate estimates were equal to the minimum threshold of 1 × 10⁻⁵ mutations per generation. This figure was generated with ggplot2 (ref. 32).
Figure 4: Explosive male-lineage expansions of the last 15,000 years.
Each circle represents a phylogenetic node whose branching pattern suggests rapid expansion. The horizontal axis indicates the timings of the expansions, and circle radii reflect growth rates—the minimum number of sons per generation, as estimated by our two-phase growth model. Nodes are grouped by continental superpopulation (AFR, African; AMR, admixed American; EAS, East Asian; EUR, European; SAS, South Asian) and colored by haplogroup. Line segments connect phylogenetically nested lineages. This figure was generated with ggplot2 (ref. 32).

References

Jobling, M.A. & Tyler-Smith, C. The human Y chromosome: an evolutionary marker comes of age. Nat. Rev. Genet. 4, 598–612 (2003).
- CAS
- ISI
- PubMed
- Article
Wei, W. et al. A calibrated human Y-chromosomal phylogeny based on resequencing. Genome Res. 23, 388–395 (2013).
- CAS
- ISI
- PubMed
- Article
Poznik, G.D. et al. Sequencing Y chromosomes resolves discrepancy in time to common ancestor of males versus females. Science 341, 562–565 (2013).
- CAS
- ISI
- PubMed
- Article
Wilson Sayres, M.A., Lohmueller, K.E. & Nielsen, R. Natural selection reduced diversity on human Y chromosomes. PLoS Genet. 10, e1004064 (2014).
- CAS
- PubMed
- Article
Karmin, M. et al. A recent bottleneck of Y chromosome diversity coincides with a global change in culture. Genome Res. 25, 459–466 (2015).
- CAS
- PubMed
- Article
Batini, C. et al. Large-scale recent expansion of European patrilineages shown by population resequencing. Nat. Commun. 6, 7152 (2015).
- CAS
- PubMed
- Article
Sikora, M.J., Colonna, V., Xue, Y. & Tyler-Smith, C. Modeling the contrasting Neolithic male lineage expansions in Europe and Africa. Investig. Genet. 4, 25 (2013).
- PubMed
- Article
Helgason, A. et al. The Y-chromosome point mutation rate in humans. Nat. Genet. 47, 453–457 (2015).
- CAS
- PubMed
- Article
Fu, Q. et al. Genome sequence of a 45,000-year-old modern human from western Siberia. Nature 514, 445–449 (2014).
- CAS
- ISI
- PubMed
- Article
Balanovsky, O. et al. Deep phylogenetic analysis of haplogroup G1 provides estimates of SNP and STR mutation rates on the human Y-chromosome and reveals migrations of Iranic speakers. PLoS One 10, e0122968 (2015).
- CAS
- PubMed
- Article
1000 Genomes Project Consortium. A global reference for human genetic variation. Nature 526, 68–74 (2015).
- CAS
- PubMed
- Article
Handsaker, R.E. et al. Large multiallelic copy number variations in humans. Nat. Genet. 47, 296–303 (2015).
- CAS
- PubMed
- Article
Bellos, E., Johnson, M.R. & Coin, L.J.M. cnvHiTSeq: integrative models for high-resolution copy number variation detection and genotyping using population sequencing data. Genome Biol. 13, R120 (2012).
- CAS
- PubMed
- Article
Hammer, M.F. et al. Out of Africa and back again: nested cladistic analysis of human Y chromosome variation. Mol. Biol. Evol. 15, 427–441 (1998).
- CAS
- ISI
- PubMed
- Article
Groucutt, H.S. et al. Rethinking the dispersal of Homo sapiens out of Africa. Evol. Anthropol. 24, 149–164 (2015).
- PubMed
- Article
Fu, Q. et al. An early modern human from Romania with a recent Neanderthal ancestor. Nature 524, 216–219 (2015).
- CAS
- PubMed
- Article
Zhang, F., Gu, W., Hurles, M.E. & Lupski, J.R. Copy number variation in human health, disease, and evolution. Annu. Rev. Genomics Hum. Genet. 10, 451–481 (2009).
- CAS
- ISI
- PubMed
- Article
Willems, T. et al. Population-scale sequencing data enables precise estimates of Y-STR mutation rates. Am. J. Hum. Genet. http://dx.doi.org/10.1016/j.ajhg.2016.04.001 (2016).
Kumar, P., Henikoff, S. & Ng, P.C. Predicting the effects of coding non-synonymous variants on protein function using the SIFT algorithm. Nat. Protoc. 4, 1073–1081 (2009).
- CAS
- ISI
- PubMed
- Article
Sudmant, P.H. et al. Global diversity, population stratification, and selection of human copy-number variation. Science 349, aab3761 (2015).
- CAS
- PubMed
- Article
Raghavan, M. et al. Genomic evidence for the Pleistocene and recent population history of Native Americans. Science 349, aab3884 (2015).
- CAS
- PubMed
- Article
de Filippo, C., Bostoen, K., Stoneking, M. & Pakendorf, B. Bringing together linguistic and genetic evidence to test the Bantu expansion. Proc. R. Soc. B Biol. Sci. 279, 3256–3263 (2012).
- Article
Jobling, M.A., Hollox, E., Hurles, M., Kivisild, T. & Tyler-Smith, C. Human Evolutionary Genetics 2nd edn (Garland Science, 2014).
Allentoft, M.E. et al. Population genomics of Bronze Age Eurasia. Nature 522, 167–172 (2015).
- CAS
- PubMed
- Article
Haak, W. et al. Massive migration from the steppe was a source for Indo-European languages in Europe. Nature 522, 207–211 (2015).
- CAS
- PubMed
- Article
Harding, A.F. European Societies in the Bronze Age (Cambridge University Press, 2000).
Bryant, E.F. & Patton, L.L. The Indo-Aryan Controversy: Evidence and Inference in Indian History (Routledge, 2005).
Xue, Y. et al. Human Y chromosome base-substitution mutation rate measured by direct sequencing in a deep-rooting pedigree. Curr. Biol. 19, 1453–1457 (2009).
- CAS
- ISI
- PubMed
- Article
Betzig, L. Means, variances, and ranges in reproductive success: comparative evidence. Evol. Hum. Behav. 33, 309–317 (2012).
- Article
Zerjal, T. et al. The genetic legacy of the Mongols. Am. J. Hum. Genet. 72, 717–721 (2003).
- CAS
- ISI
- PubMed
- Article
Balaresque, P. et al. Y-chromosome descent clusters and male differential reproductive success: young lineage expansions dominate Asian pastoral nomadic populations. Eur. J. Hum. Genet. 23, 1413–1422 (2015).
- CAS
- PubMed
- Article
Wickham, H. ggplot2: Elegant Graphics for Data Analysis (Springer, 2009).
Li, H. et al. The Sequence Alignment/Map format and SAMtools. Bioinformatics 25, 2078–2079 (2009).
- CAS
- ISI
- PubMed
- Article
Garrison, E. & Marth, G. Haplotype-based variant detection from short-read sequencing. arXiv http://arxiv.org/abs/1207.3907 (2012).
Rimmer, A. et al. Integrating mapping-, assembly- and haplotype-based approaches for calling variants in clinical sequencing applications. Nat. Genet. 46, 912–918 (2014).
- CAS
- PubMed
- Article
Iqbal, Z., Caccamo, M., Turner, I., Flicek, P. & McVean, G. De novo assembly and genotyping of variants using colored de Bruijn graphs. Nat. Genet. 44, 226–232 (2012).
- CAS
- ISI
- PubMed
- Article
McKenna, A. et al. The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res. 20, 1297–1303 (2010).
- CAS
- ISI
- PubMed
- Article
DePristo, M.A. et al. A framework for variation discovery and genotyping using next-generation DNA sequencing data. Nat. Genet. 43, 491–498 (2011).
- CAS
- ISI
- PubMed
- Article
Pique-Regi, R. et al. Sparse representation and Bayesian detection of genome copy number alterations from microarray data. Bioinformatics 24, 309–318 (2008).
- CAS
- ISI
- PubMed
- Article
Pique-Regi, R., Cáceres, A. & González, J.R. R-Gada: a fast and flexible pipeline for copy number analysis in association studies. BMC Bioinformatics 11, 380 (2010).
- CAS
- PubMed
- Article
Conrad, D.F. et al. Origins and functional impact of copy number variation in the human genome. Nature 464, 704–712 (2010).
- CAS
- ISI
- PubMed
- Article
Perry, G.H. et al. Copy number variation and evolution in humans and chimpanzees. Genome Res. 18, 1698–1710 (2008).
- CAS
- PubMed
- Article
Polley, S. et al. Evolution of the rapidly mutating human salivary agglutinin gene (DMBT1) and population subsistence strategy. Proc. Natl. Acad. Sci. USA 112, 5105–5110 (2015).
- CAS
- PubMed
- Article
Carpenter, D. et al. Obesity, starch digestion and amylase: association between copy number variants at human salivary (AMY1) and pancreatic (AMY2) amylase genes. Hum. Mol. Genet. 24, 3472–3480 (2015).
- CAS
- PubMed
- Article
Verma, R.S. & Babu, A. Human Chromosomes: Principles & Techniques 2nd edn. (McGraw-Hill, 1995).
Gribble, S.M. et al. Massively parallel sequencing reveals the complex structure of an irradiated human chromosome on a mouse background in the Tc1 model of Down syndrome. PLoS One 8, e60482 (2013).
- CAS
- PubMed
- Article
Felsenstein, J. Evolutionary trees from DNA sequences: a maximum likelihood approach. J. Mol. Evol. 17, 368–376 (1981).
- CAS
- ISI
- PubMed
- Article
Stamatakis, A. RAxML version 8: a tool for phylogenetic analysis and post-analysis of large phylogenies. Bioinformatics 30, 1312–1313 (2014).
- CAS
- ISI
- PubMed
- Article
McLaren, W. et al. Deriving the consequences of genomic variants with the Ensembl API and SNP Effect Predictor. Bioinformatics 26, 2069–2070 (2010).
- CAS
- ISI
- PubMed
- Article
Kircher, M. et al. A general framework for estimating the relative pathogenicity of human genetic variants. Nat. Genet. 46, 310–315 (2014).
- CAS
- ISI
- PubMed
- Article
Adzhubei, I.A. et al. A method and server for predicting damaging missense mutations. Nat. Methods 7, 248–249 (2010).
- CAS
- ISI
- PubMed
- Article
van Oven, M. & Kayser, M. Updated comprehensive phylogenetic tree of global human mitochondrial DNA variation. Hum. Mutat. 30, E386–E394 (2009).
- ISI
- PubMed
- Article
Danecek, P. et al. The variant call format and VCFtools. Bioinformatics 27, 2156–2158 (2011).
- CAS
- ISI
- PubMed
- Article
Tamura, K., Stecher, G., Peterson, D., Filipski, A. & Kumar, S. MEGA6: molecular evolutionary genetics analysis version 6.0. Mol. Biol. Evol. 30, 2725–2729 (2013).
- CAS
- ISI
- PubMed
- Article
Kloss-Brandstätter, A. et al. HaploGrep: a fast and reliable algorithm for automatic classification of mitochondrial DNA haplogroups. Hum. Mutat. 32, 25–32 (2011).
- CAS
- ISI
- PubMed
- Article
Hudson, R.R. Generating samples under a Wright–Fisher neutral model of genetic variation. Bioinformatics 18, 337–338 (2002).
- CAS
- ISI
- PubMed
- Article
Lohmueller, K.E., Bustamante, C.D. & Clark, A.G. Methods for human demographic inference using haplotype patterns from genomewide single-nucleotide polymorphism data. Genetics 182, 217–231 (2009).
- CAS
- PubMed
- Article
Lohmueller, K.E., Bustamante, C.D. & Clark, A.G. The effect of recent admixture on inference of ancient human population history. Genetics 185, 611–622 (2010).
- CAS
- PubMed
- Article

Download references

Author information

Abstract•
References•
Author information•
Supplementary information

Affiliations

Program in Biomedical Informatics, Stanford University, Stanford, California, USA.
- G David Poznik
Department of Genetics, Stanford University, Stanford, California, USA.
- G David Poznik,
- Fernando L Mendez,
- Peter A Underhill &
- Carlos D Bustamante
Wellcome Trust Sanger Institute, Wellcome Genome Campus, Hinxton, UK.
- Yali Xue,
- Andrea Massaia,
- Qasim Ayub,
- Shane A McCarthy,
- Yuan Chen,
- Ruby Banerjee,
- Maria Cerezo,
- Sandra Louzada,
- Graham R S Ritchie,
- Tomas W Fitzgerald,
- Erik Garrison,
- Fengtang Yang &
- Chris Tyler-Smith
Computational and Systems Biology Program, Massachusetts Institute of Technology, Cambridge, Massachusetts, USA.
- Thomas F Willems
New York Genome Center, New York, New York, USA.
- Thomas F Willems,
- Melissa Gymrek &
- Yaniv Erlich
School of Life Sciences, Arizona State University, Tempe, Arizona, USA.
- Melissa A Wilson Sayres
Center for Evolution and Medicine, Biodesign Institute, Arizona State University, Tempe, Arizona, USA.
- Melissa A Wilson Sayres
Sackler Institute for Comparative Genomics, American Museum of Natural History, New York, New York, USA.
- Apurva Narechania &
- Rob Desalle
Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, Massachusetts, USA.
- Seva Kashin &
- Robert E Handsaker
Department of Genetic Medicine, Weill Cornell Medical College, New York, New York, USA.
- Juan L Rodriguez-Flores
Institute for Molecular Bioscience, University of Queensland, St Lucia, Queensland, Australia.
- Haojing Shao &
- Lachlan Coin
Harvard–MIT Division of Health Sciences and Technology, Massachusetts Institute of Technology, Cambridge, Massachusetts, USA.
- Melissa Gymrek
Jackson Laboratory for Genomic Medicine, Farmington, Connecticut, USA.
- Ankit Malhotra,
- Eliza Cerveira,
- Mallory Romanovitch,
- Chengsheng Zhang &
- Charles Lee
European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, UK.
- Graham R S Ritchie,
- Xiangqun Zheng-Bradley,
- Paul Flicek,
- Daniel R Zerbino &
- Laura Clarke
Department of Genetics, Albert Einstein College of Medicine, Bronx, New York, USA.
- Anthony Marcketta &
- Adam Auton
Virginia Bioinformatics Institute, Virginia Tech, Blacksburg, Virginia, USA.
- David Mittelman
Department of Biological Sciences, Virginia Tech, Blacksburg, Virginia, USA.
- David Mittelman
Department of Biostatistics, School of Public Health, University of Michigan, Ann Arbor, Michigan, USA.
- Gonçalo R Abecasis
Department of Genetics, Harvard Medical School, Boston, Massachusetts, USA.
- Steven A McCarroll &
- Robert E Handsaker
Department of Life Sciences, Ewha Womans University, Seoul, Republic of Korea.
- Charles Lee
Department of Computer Science, Fu Foundation School of Engineering, Columbia University, New York, New York, USA.
- Yaniv Erlich
Center for Computational Biology and Bioinformatics, Columbia University, New York, New York, USA.
- Yaniv Erlich
Department of Biomedical Data Science, Stanford University, Stanford, California, USA.
- Carlos D Bustamante

Consortia

The 1000 Genomes Project Consortium
A list of members and affiliations appears in the Supplementary Note.

Contributions

G.D.P., Y.X., C.D.B., and C.T.-S. conceived and designed the project. R.B., S.L., and F.Y. generated FISH data. A. Malhotra, M.R., E.C., C.Z., and C.L. generated aCGH data. G.D.P., Y.X., F.L.M., T.F.W., A. Massaia, M.A.W.S., Q.A., S.A. McCarthy, A.N., S.K., Y.C., J.L.R.-F., M.C., H.S., M.G., R.D., G.R.S.R., T.W.F., E.G., A. Marcketta, D.M., X.Z.-B., G.R.A., S.A. McCarroll, P.F., P.A.U., L. Coin, D.R.Z., L. Clarke, A.A., Y.E., R.E.H., C.D.B., and C.T.-S. analyzed the data. G.D.P., Y.X., F.L.M., T.F.W., A. Massaia, M.A.W.S., Q.A., and C.T.-S. wrote the manuscript. All authors reviewed, revised, and provided feedback on the manuscript.

Competing financial interests

G.D.P. and A.A. are employees of 23andMe. P.F. is a member of the Scientific Advisory Board (SAB) for Omicia, Inc. P.A.U. has consulted for and owns stock options of 23andMe. Y.E. is an SAB member of Identify Genomics, BigDataBio, and Solve, Inc. C.D.B. is on the SABs of AncestryDNA, BigDataBio, Etalon DX, Liberty Biosecurity, and Personalis. He is also a founder and SAB chair of IdentifyGenomics. None of these entities had a role in the design, execution, interpretation, or presentation of this study.

Supplementary information

Abstract•
References•
Author information•
Supplementary information

PDF files

Supplementary Text and Figures (17,014 KB)
Supplementary Figures 1–31, Supplementary Tables 1–19 and Supplementary Note.

Zip files

Supplementary Data (6,335 KB)
Supplementary Data on SNVs, CNVs, STRs, haplogroups, phylogenetic analyses, functional annotations, mtDNA analysis, and expansion analyses.

Punctuated bursts in human male demography inferred from 1,244 worldwide Y-chromosome sequences : Nature Genetics : Nature Research

Abstract

At a glance

Figures

References

Author information

Affiliations

Program in Biomedical Informatics, Stanford University, Stanford, California, USA.

Department of Genetics, Stanford University, Stanford, California, USA.

Wellcome Trust Sanger Institute, Wellcome Genome Campus, Hinxton, UK.

Computational and Systems Biology Program, Massachusetts Institute of Technology, Cambridge, Massachusetts, USA.

New York Genome Center, New York, New York, USA.

School of Life Sciences, Arizona State University, Tempe, Arizona, USA.

Center for Evolution and Medicine, Biodesign Institute, Arizona State University, Tempe, Arizona, USA.

Sackler Institute for Comparative Genomics, American Museum of Natural History, New York, New York, USA.

Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, Massachusetts, USA.

Department of Genetic Medicine, Weill Cornell Medical College, New York, New York, USA.

Institute for Molecular Bioscience, University of Queensland, St Lucia, Queensland, Australia.

Harvard–MIT Division of Health Sciences and Technology, Massachusetts Institute of Technology, Cambridge, Massachusetts, USA.

Jackson Laboratory for Genomic Medicine, Farmington, Connecticut, USA.

European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, UK.

Department of Genetics, Albert Einstein College of Medicine, Bronx, New York, USA.

Virginia Bioinformatics Institute, Virginia Tech, Blacksburg, Virginia, USA.

Department of Biological Sciences, Virginia Tech, Blacksburg, Virginia, USA.

Department of Biostatistics, School of Public Health, University of Michigan, Ann Arbor, Michigan, USA.

Department of Genetics, Harvard Medical School, Boston, Massachusetts, USA.

Department of Life Sciences, Ewha Womans University, Seoul, Republic of Korea.

Department of Computer Science, Fu Foundation School of Engineering, Columbia University, New York, New York, USA.

Center for Computational Biology and Bioinformatics, Columbia University, New York, New York, USA.

Department of Biomedical Data Science, Stanford University, Stanford, California, USA.