nature.com

In vivo enhancer analysis of human conserved non-coding sequences - Nature

  • ️Rubin, Edward M.
  • ️Sun Nov 05 2006

Abstract

Identifying the sequences that direct the spatial and temporal expression of genes and defining their function in vivo remains a significant challenge in the annotation of vertebrate genomes. One major obstacle is the lack of experimentally validated training sets. In this study, we made use of extreme evolutionary sequence conservation as a filter to identify putative gene regulatory elements, and characterized the in vivo enhancer activity of a large group of non-coding elements in the human genome that are conserved in human–pufferfish, Takifugu (Fugu) rubripes, or ultraconserved1 in human–mouse–rat. We tested 167 of these extremely conserved sequences in a transgenic mouse enhancer assay. Here we report that 45% of these sequences functioned reproducibly as tissue-specific enhancers of gene expression at embryonic day 11.5. While directing expression in a broad range of anatomical structures in the embryo, the majority of the 75 enhancers directed expression to various regions of the developing nervous system. We identified sequence signatures enriched in a subset of these elements that targeted forebrain expression, and used these features to rank all 3,100 non-coding elements in the human genome that are conserved between human and Fugu. The testing of the top predictions in transgenic mice resulted in a threefold enrichment for sequences with forebrain enhancer activity. These data dramatically expand the catalogue of human gene enhancers that have been characterized in vivo, and illustrate the utility of such training sets for a variety of biological applications, including decoding the regulatory vocabulary of the human genome.

This is a preview of subscription content, access via your institution

Access options

Subscribe to this journal

Receive 51 print issues and online access

$199.00 per year

only $3.90 per issue

Buy this article

  • Purchase on SpringerLink
  • Instant access to full article PDF

Prices may be subject to local taxes which are calculated during checkout

Additional access options:

Similar content being viewed by others

References

  1. Bejerano, G. et al. Ultraconserved elements in the human genome. Science 304, 1321–1325 (2004)

    Article  ADS  CAS  Google Scholar 

  2. Roeder, R. G. & Rutter, W. J. Multiple forms of DNA-dependent RNA polymerase in eukaryotic organisms. Nature 224, 234–237 (1969)

    Article  ADS  CAS  Google Scholar 

  3. Goldberg, M. L. Sequence Analysis of Drosophila Histone Genes. Ph.D. thesis, Stanford Univ. (1979)

  4. Stathopoulos, A. & Levine, M. Genomic regulatory networks and animal development. Dev. Cell 9, 449–462 (2005)

    Article  CAS  Google Scholar 

  5. Levine, M. & Tjian, R. Transcription regulation and animal diversity. Nature 424, 147–151 (2003)

    Article  ADS  CAS  Google Scholar 

  6. Emison, E. S. et al. A common sex-dependent mutation in a RET enhancer underlies Hirschsprung disease risk. Nature 434, 857–863 (2005)

    Article  ADS  CAS  Google Scholar 

  7. Kleinjan, D. A. & van Heyningen, V. Long-range control of gene expression: emerging mechanisms and disruption in disease. Am. J. Hum. Genet. 76, 8–32 (2005)

    Article  CAS  Google Scholar 

  8. Lettice, L. A. et al. A long-range Shh enhancer regulates expression in the developing limb and fin and is associated with preaxial polydactyly. Hum. Mol. Genet. 12, 1725–1735 (2003)

    Article  CAS  Google Scholar 

  9. Boffelli, D. et al. Phylogenetic shadowing of primate sequences to find functional regions of the human genome. Science 299, 1391–1394 (2003)

    Article  CAS  Google Scholar 

  10. Nobrega, M. A., Ovcharenko, I., Afzal, V. & Rubin, E. M. Scanning human gene deserts for long-range enhancers. Science 302, 413 (2003)

    Article  CAS  Google Scholar 

  11. Prabhakar, S. et al. Close sequence comparisons are sufficient to identify human cis-regulatory elements. Genome Res. 16, (7)855–863 (2006)

    Article  CAS  Google Scholar 

  12. Woolfe, A. et al. Highly conserved non-coding sequences are associated with vertebrate development. PLoS Biol. 3, e7 (2005)

    Article  Google Scholar 

  13. Kothary, R. et al. Inducible expression of an hsp68-lacZ hybrid gene in transgenic mice. Development 105, 707–714 (1989)

    CAS  PubMed  Google Scholar 

  14. Rojas, A. et al. Gata4 expression in lateral mesoderm is downstream of BMP4 and is activated directly by Forkhead and GATA transcription factors through a distal enhancer element. Development 132, 3405–3417 (2005)

    Article  CAS  Google Scholar 

  15. Rossant, J., Zirngibl, R., Cado, D., Shago, M. & Giguere, V. Expression of a retinoic acid response element-hsplacZ transgene defines specific domains of transcriptional activity during mouse embryogenesis. Genes Dev. 5, 1333–1344 (1991)

    Article  CAS  Google Scholar 

  16. Yamagishi, H. et al. Tbx1 is regulated by tissue-specific forkhead proteins through a common Sonic hedgehog-responsive enhancer. Genes Dev. 17, 269–281 (2003)

    Article  CAS  Google Scholar 

  17. Boffelli, D., Nobrega, M. A. & Rubin, E. M. Comparative genomics at the vertebrate extremes. Nature Rev. Genet. 5, 456–465 (2004)

    Article  CAS  Google Scholar 

  18. Ahituv, N., Prabhakar, S., Poulin, F., Rubin, E. M. & Couronne, O. Mapping cis-regulatory domains in the human genome using multi-species conservation of synteny. Hum. Mol. Genet. 14, 3057–3063 (2005)

    Article  CAS  Google Scholar 

  19. Kohlhase, J., Wischermann, A., Reichenbach, H., Froster, U. & Engel, W. Mutations in the SALL1 putative transcription factor gene cause Townes-Brocks syndrome. Nature Genet. 18, 81–83 (1998)

    Article  CAS  Google Scholar 

  20. Buck, A., Kispert, A. & Kohlhase, J. Embryonic expression of the murine homologue of SALL1, the gene mutated in Townes–Brocks syndrome. Mech. Dev. 104, 143–146 (2001)

    Article  CAS  Google Scholar 

  21. Carroll, S. B. Evolution at two levels: on genes and form. PLoS Biol. 3, e245 (2005)

    Article  Google Scholar 

  22. Davidson, E. H. Genomic Regulatory Systems: In Development and Evolution (Academic, San Diego, 2001)

    Google Scholar 

  23. Lee, T. I. et al. Control of developmental regulators by Polycomb in human embryonic stem cells. Cell 125, 301–313 (2006)

    Article  CAS  Google Scholar 

  24. Bard, J. L. et al. An internet-accessible database of mouse developmental anatomy based on a systematic nomenclature. Mech. Dev. 74, 111–120 (1998)

    Article  CAS  Google Scholar 

  25. Gray, P. A. et al. Mouse brain organization revealed through direct genome-scale TF expression analysis. Science 306, 2255–2257 (2004)

    Article  ADS  CAS  Google Scholar 

  26. Poulin, F. et al. In vivo characterization of a vertebrate ultraconserved enhancer. Genomics 85, 774–781 (2005)

    Article  CAS  Google Scholar 

  27. van Helden, J., Andre, B. & Collado-Vides, J. Extracting regulatory sites from the upstream region of yeast genes by computational analysis of oligonucleotide frequencies. J. Mol. Biol. 281, 827–842 (1998)

    Article  CAS  Google Scholar 

  28. Kurokawa, D. et al. Regulation of Otx2 expression and its functions in mouse forebrain and midbrain. Development 131, 3319–3331 (2004)

    Article  CAS  Google Scholar 

  29. Zhou, J., Zwicker, J., Szymanski, P., Levine, M. & Tjian, R. TAFII mutations disrupt Dorsal activation in the Drosophila embryo. Proc. Natl Acad. Sci. USA 95, 13483–13488 (1998)

    Article  ADS  CAS  Google Scholar 

Download references

Acknowledgements

Research was conducted at the E. O. Lawrence Berkeley National Laboratory, under the Programs for Genomic Application, funded by the National Heart, Lung, and Blood Institute, USA as well as the National Human Genome Research Institute, USA, and performed under a Department of Energy Contract with the University of California.

Author information

Author notes

  1. Marcelo A. Nobrega

    Present address: Department of Human Genetics, University of Chicago, Chicago, Illinois, 60637, USA

Authors and Affiliations

  1. US Department of Energy Joint Genome Institute, Walnut Creek, California, 94598, USA

    Len A. Pennacchio, Inna Dubchak, Olivier Couronne & Edward M. Rubin

  2. Genomics Division, MS 84-171, Lawrence Berkeley National Laboratory, Berkeley, California, 94720, USA

    Len A. Pennacchio, Nadav Ahituv, Alan M. Moses, Shyam Prabhakar, Marcelo A. Nobrega, Malak Shoukry, Simon Minovitsky, Inna Dubchak, Amy Holt, Keith D. Lewis, Ingrid Plajzer-Frick, Jennifer Akiyama, Veena Afzal, Olivier Couronne, Michael B. Eisen, Axel Visel & Edward M. Rubin

  3. Molecular and Cellular Biology Department, University of California-Berkeley, California, 954720, USA

    Michael B. Eisen

  4. Cardiovascular Research Institute, University of California, San Francisco, California, 94143-2240, USA

    Sarah De Val & Brian L. Black

Authors

  1. Len A. Pennacchio

    You can also search for this author inPubMed Google Scholar

  2. Nadav Ahituv

    You can also search for this author inPubMed Google Scholar

  3. Alan M. Moses

    You can also search for this author inPubMed Google Scholar

  4. Shyam Prabhakar

    You can also search for this author inPubMed Google Scholar

  5. Marcelo A. Nobrega

    You can also search for this author inPubMed Google Scholar

  6. Malak Shoukry

    You can also search for this author inPubMed Google Scholar

  7. Simon Minovitsky

    You can also search for this author inPubMed Google Scholar

  8. Inna Dubchak

    You can also search for this author inPubMed Google Scholar

  9. Amy Holt

    You can also search for this author inPubMed Google Scholar

  10. Keith D. Lewis

    You can also search for this author inPubMed Google Scholar

  11. Ingrid Plajzer-Frick

    You can also search for this author inPubMed Google Scholar

  12. Jennifer Akiyama

    You can also search for this author inPubMed Google Scholar

  13. Sarah De Val

    You can also search for this author inPubMed Google Scholar

  14. Veena Afzal

    You can also search for this author inPubMed Google Scholar

  15. Brian L. Black

    You can also search for this author inPubMed Google Scholar

  16. Olivier Couronne

    You can also search for this author inPubMed Google Scholar

  17. Michael B. Eisen

    You can also search for this author inPubMed Google Scholar

  18. Axel Visel

    You can also search for this author inPubMed Google Scholar

  19. Edward M. Rubin

    You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence to Len A. Pennacchio.

Ethics declarations

Competing interests

Reprints and permissions information is available at www.nature.com/reprints. The authors declare no competing financial interests.

Supplementary information

Supplementary Table 1.

A summary of all the human conserved noncoding fragments tested for enhancer activity at embryonic day 11.5. Enhancer ID refers to a unique identifier defined at http://enhancer.lbl.gov. (XLS 37 kb)

Supplementary Table 2.

A compilation of human-fugu conserved noncoding elements in the human genome. (XLS 208 kb)

Supplementary Table 3.

The top 30 forebrain enhancer predictions in the human genome. The strategy to generate this list can be found in the Supplementary Methods. (XLS 18 kb)

Supplementary Methods.

An expanded version of the Materials and Methods. (DOC 61 kb)

About this article

Cite this article

Pennacchio, L., Ahituv, N., Moses, A. et al. In vivo enhancer analysis of human conserved non-coding sequences. Nature 444, 499–502 (2006). https://doi.org/10.1038/nature05295

Download citation

  • Received: 14 June 2006

  • Accepted: 22 September 2006

  • Published: 05 November 2006

  • Issue Date: 23 November 2006

  • DOI: https://doi.org/10.1038/nature05295

This article is cited by

Editorial Summary

Gene regulators unmasked

Identifying the non-coding DNA sequences that act at a distance to regulate patterns of gene expression is not a simple matter; one useful pointer is evolutionary sequence conservation. An in vivo analysis of 167 non-coding elements in the human genome that are extremely conserved based on comparisons with pufferfish, rat and mouse genomes, has identified 75 previously unknown tissue-specific enhancers. These are active in embryos on day 11, most of them directing expression in the developing nervous system. The success of this method suggests that the further 5,500 non-coding sequences conserved between humans and pufferfish may yield another new batch of gene enhancers.