nature.com

The state of play in higher eukaryote gene annotation - Nature Reviews Genetics

  • ️Harrow, Jennifer
  • ️Mon Oct 24 2016
  • Gerstein, M. B. et al. What is a gene, post-ENCODE? History and updated definition. Genome Res. 17, 669–681 (2007). This influential article attempts to rationalize a modern description of the gene in the context of transcriptional complexity.

    Article  CAS  PubMed  Google Scholar 

  • Harrow, J. et al. GENCODE: The reference human genome annotation for The ENCODE Project. Genome Res. 22, 1760–1774 (2012). This provides a detailed description of the GENCODE annotation pipeline.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Kim, V. N., Han, J. & Siomi, M. C. Biogenesis of small RNAs in animals. Nat. Rev. Mol. Cell Biol. 10, 126–139 (2009).

    Article  CAS  PubMed  Google Scholar 

  • Andersson, L. et al. Coordinated international action to accelerate genome-to-phenome with FAANG, the Functional Annotation of Animal Genomes project. Genome Biol. 16, 57 (2015).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  • O'Leary, N. A. et al. Reference sequence (RefSeq) database at NCBI: current status, taxonomic expansion, and functional annotation. Nucleic Acids Res. 44, D733–D745 (2016). This is an excellent starting point for exploring the NCBI annotation resources.

    Article  CAS  PubMed  Google Scholar 

  • McGarvey, K. M. et al. Mouse genome annotation by the RefSeq project. Mamm. Genome 26, 379–390 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Mudge, J. M. & Harrow, J. Creating reference gene annotation for the mouse C57BL6/J genome assembly. Mamm. Genome 26, 366–378 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Berardini, T. Z. et al. The Arabidopsis information resource: making and mining the “gold standard” annotated reference plant genome. Genesis 53, 474–485 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Howe, K. L. et al. WormBase 2016: expanding to enable helminth genomic research. Nucleic Acids Res. 44, D774–D780 (2016).

    Article  CAS  PubMed  Google Scholar 

  • Attrill, H. et al. FlyBase: establishing a Gene Group resource for Drosophila melanogaster. Nucleic Acids Res. 44, D786–D792 (2016).

    Article  CAS  PubMed  Google Scholar 

  • Elsik, C. G. et al. Finding the missing honey bee genes: lessons learned from a genome upgrade. BMC Genomics 15, 86 (2014).

    Article  PubMed  PubMed Central  Google Scholar 

  • Conesa, A. et al. A survey of best practices for RNA-seq data analysis. Genome Biol. 17, 13 (2016). This provides a detailed description and comparison of various RNA-seq analytical pipelines.

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  • Boutet, E. et al. UniProtKB/Swiss-prot, the manually annotated section of the UniProt KnowledgeBase: how to use the entry view. Methods Mol. Biol. 1374, 23–54 (2016). The UniProt and Swiss-Prot resources are outlined here.

    Article  CAS  PubMed  Google Scholar 

  • Stanke, M. & Waack, S. Gene prediction with a hidden Markov model and a new intron submodel. Bioinformatics 19 (Suppl. 2), ii215–ii225 (2003).

    PubMed  Google Scholar 

  • Burge, C. & Karlin, S. Prediction of complete gene structures in human genomic DNA. J. Mol. Biol. 268, 78–94 (1997).

    Article  CAS  PubMed  Google Scholar 

  • Yandell, M. & Ence, D. A beginner's guide to eukaryotic genome annotation. Nat. Rev. Genet. 13, 329–342 (2012).

    Article  CAS  PubMed  Google Scholar 

  • Gray, K. A., Yates, B., Seal, R. L., Wright, M. W. & Bruford, E. A. Genenames.org: the HGNC resources in 2015. Nucleic Acids Res. 43, D1079–D1085 (2015).

    Article  CAS  PubMed  Google Scholar 

  • Guigo, R. et al. EGASP: the human ENCODE Genome Annotation Assessment Project. Genome Biol. 7, S2 (2006).

    Article  PubMed  PubMed Central  Google Scholar 

  • Zhang, G. et al. Comparative genomics reveals insights into avian genome evolution and adaptation. Science 346, 1311–1320 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Eory, L. et al. Avianbase: a community resource for bird genomics. Genome Biol. 16, 21 (2015).

    Article  PubMed  PubMed Central  Google Scholar 

  • Hoff, K. J., Lange, S., Lomsadze, A., Borodovsky, M. & Stanke, M. BRAKER1: unsupervised RNA-Seq-Based genome annotation with GeneMark-ET and AUGUSTUS. Bioinformatics 32, 767–769 (2016).

    Article  CAS  PubMed  Google Scholar 

  • Stanke, M., Diekhans, M., Baertsch, R. & Haussler, D. Using native and syntenically mapped cDNA alignments to improve de novo gene finding. Bioinformatics 24, 637–644 (2008).

    Article  CAS  PubMed  Google Scholar 

  • Trapnell, C. et al. Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform switching during cell differentiation. Nat. Biotechnol. 28, 511–515 (2010).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Loveland, J. E., Gilbert, J. G., Griffiths, E. & Harrow, J. L. Community gene annotation in practice. Database (Oxford) 2012, bas009 (2012).

    Article  CAS  Google Scholar 

  • Pennisi, E. Ideas fly at gene-finding jamboree. Science 287, 2182–2184 (2000).

    Article  CAS  PubMed  Google Scholar 

  • Archibald, A. L. et al. Pig genome sequence—analysis and publication strategy. BMC Genomics 11, 438 (2010).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  • Lee, E. et al. Web Apollo: a web-based genomic annotation editing platform. Genome Biol. 14, R93 (2013).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  • Giraldo-Calderon, G. I. et al. VectorBase: an updated bioinformatics resource for invertebrate vectors and other organisms related with human diseases. Nucleic Acids Res. 43, D707–D713 (2015).

    Article  CAS  PubMed  Google Scholar 

  • Dawson, H. D. et al. Structural and functional annotation of the porcine immunome. BMC Genomics 14, 332 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • The UK 10K Consortium. The UK10K project identifies rare variants in health and disease. Nature 526, 82–90 (2015).

  • Guo, L., Gao, Z. & Qian, Q. Application of resequencing to rice genomics, functional genomics and evolutionary analysis. Rice (N.Y.) 7, 4 (2014).

    Article  Google Scholar 

  • Foote, A. D. et al. Genome-culture coevolution promotes rapid divergence of killer whale ecotypes. Nat. Commun. 7, 11693 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Adams, D. J., Doran, A. G., Lilue, J. & Keane, T. M. The Mouse Genomes Project: a repository of inbred laboratory mouse strain genomes. Mamm. Genome 26, 403–412 (2015).

    Article  PubMed  Google Scholar 

  • Baker, M. Structural variation: the genome's hidden architecture. Nat. Methods 9, 133–137 (2012).

    Article  CAS  PubMed  Google Scholar 

  • Zarrei, M., MacDonald, J. R., Merico, D. & Scherer, S. W. A copy number variation map of the human genome. Nat. Rev. Genet. 16, 172–183 (2015).

    Article  CAS  PubMed  Google Scholar 

  • Trowsdale, J. & Knight, J. C. Major histocompatibility complex genomics and human disease. Annu. Rev. Genom. Hum. Genet. 14, 301–323 (2013).

    Article  CAS  Google Scholar 

  • Hirayasu, K. & Arase, H. Functional and genetic diversity of leukocyte immunoglobulin-like receptor and implication for disease associations. J. Hum. Genet. 60, 703–708 (2015).

    Article  CAS  PubMed  Google Scholar 

  • Iyer, M. K. et al. The landscape of long noncoding RNAs in the human transcriptome. Nat. Genet. 47, 199–208 (2015). In this study, thousands of human RNA-seq libraries are combined to generate almost 60,000 putative lncRNA genes.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Filichkin, S. A. et al. Genome-wide mapping of alternative splicing in Arabidopsis thaliana. Genome Res. 20, 45–58 (2010).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Mudge, J. M., Frankish, A. & Harrow, J. Functional transcriptomics in the post-ENCODE era. Genome Res. 23, 1961–1973 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Steijger, T. et al. Assessment of transcript reconstruction methods for RNA-seq. Nat. Methods 10, 1177–1184 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Cho, H. et al. High-resolution transcriptome analysis with long-read RNA sequencing. PLoS ONE 9, e108095 (2014).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  • Tilgner, H. et al. Comprehensive transcriptome analysis using synthetic long-read sequencing reveals molecular co-association of distant splicing events. Nat. Biotechnol. 33, 736–742 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Mercer, T. R. et al. Targeted sequencing for gene discovery and quantification using RNA CaptureSeq. Nat. Protoc. 9, 989–1009 (2014).

    Article  CAS  PubMed  Google Scholar 

  • Jiang, L. et al. Synthetic spike-in standards for RNA-seq experiments. Genome Res. 21, 1543–1551 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • The FANTOM Consortium et al. A promoter-level mammalian expression atlas. Nature 507, 462–470 (2014). The leading publication of the FANTOM5 project, providing detailed analysis of hundreds of human and mouse CAGE experiments.

  • Derti, A. et al. A quantitative atlas of polyadenylation in five mammals. Genome Res. 22, 1173–1183 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Boley, N. et al. Genome-guided transcript assembly by integrative analysis of RNA sequence data. Nat. Biotechnol. 32, 341–346 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Hezroni, H. et al. Principles of long noncoding RNA evolution derived from direct comparison of transcriptomes in 17 species. Cell Rep. 11, 1110–1122 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Sisu, C. et al. Comparative analysis of pseudogenes across three phyla. Proc. Natl Acad. Sci. USA 111, 13361–13366 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Frankish, A. & Harrow, J. GENCODE pseudogenes. Methods Mol. Biol. 1167, 129–155 (2014).

    Article  PubMed  Google Scholar 

  • Carelli, F. N. et al. The life history of retrocopies illuminates the evolution of new mammalian genes. Genome Res. 26, 301–314 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Zhang, Z. et al. PseudoPipe: an automated pseudogene identification pipeline. Bioinformatics 22, 1437–1439 (2006).

    Article  CAS  PubMed  Google Scholar 

  • Pei, B. et al. The GENCODE pseudogene resource. Genome Biol. 13, R51 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Kelemen, O. et al. Function of alternative splicing. Gene 514, 1–30 (2013).

    Article  CAS  PubMed  Google Scholar 

  • Yang, X. et al. Widespread expansion of protein interaction capabilities by alternative splicing. Cell 164, 805–817 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Pickrell, J. K., Pai, A. A., Gilad, Y. & Pritchard, J. K. Noisy splicing drives mRNA isoform diversity in human cells. PLoS Genet. 6, e1001236 (2010).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  • Hao, Y. et al. Semi-supervised learning predicts approximately one third of the alternative splicing isoforms as functional proteins. Cell Rep. 12, 183–189 (2015).

    Article  CAS  PubMed  Google Scholar 

  • Rodriguez, J. M. et al. APPRIS: annotation of principal and alternative splice isoforms. Nucleic Acids Res. 41, D110–D117 (2013).

    Article  CAS  PubMed  Google Scholar 

  • Farrell, C. M. et al. Current status and new features of the Consensus Coding Sequence database. Nucleic Acids Res. 42, D865–D872 (2014).

    Article  CAS  PubMed  Google Scholar 

  • Bassett, A. R. et al. Considerations when investigating lncRNA function in vivo. eLife 3, e03058 (2014).

    Article  PubMed  PubMed Central  Google Scholar 

  • Derrien, T., Guigo, R. & Johnson, R. The long non-coding RNAs: a new (P)layer in the “dark matter”. Front. Genet. 2, 107 (2011).

    PubMed  Google Scholar 

  • Hangauer, M. J., Vaughn, I. W. & McManus, M. T. Pervasive transcription of the human genome produces thousands of previously unidentified long intergenic noncoding RNAs. PLoS Genet. 9, e1003569 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • van Bakel, H., Nislow, C., Blencowe, B. J. & Hughes, T. R. Most “dark matter” transcripts are associated with known genes. PLoS Biol. 8, e1000371 (2010).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  • Peccarelli, M. & Kebaara, B. W. Regulation of natural mRNAs by the nonsense-mediated mRNA decay pathway. Eukaryot. Cell 13, 1126–1135 (2014).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  • Lareau, L. F. & Brenner, S. E. Regulation of splicing factors by alternative splicing and NMD is conserved between kingdoms yet evolutionarily flexible. Mol. Biol. Evol. 32, 1072–1079 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Wong, J. J. et al. Orchestrated intron retention regulates normal granulocyte differentiation. Cell 154, 583–595 (2013).

    Article  CAS  PubMed  Google Scholar 

  • Braunschweig, U. et al. Widespread intron retention in mammals functionally tunes transcriptomes. Genome Res. 24, 1774–1786 (2014). Demonstrates that intron retention affects three-quarters of mammalian genes, and suggests widespread involvement in gene regulation.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Capell, A., Fellerer, K. & Haass, C. Progranulin transcripts with short and long 5′ untranslated regions (UTRs) are differentially expressed via posttranscriptional and translational repression. J. Biol. Chem. 289, 25879–25889 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Barbosa, C., Peixeiro, I. & Romao, L. Gene expression regulation by upstream open reading frames and human disease. PLoS Genet. 9, e1003529 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Yeh, H. S. & Yong, J. Alternative polyadenylation of mRNAs: 3′-untranslated region matters in gene expression. Mol. Cells 39, 281–285 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Barrett, L. W., Fletcher, S. & Wilton, S. D. Regulation of eukaryotic gene expression by the untranslated gene regions and other non-coding elements. Cell. Mol. Life Sci. 69, 3613–3634 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Mudge, J. M. et al. The origins, evolution, and functional potential of alternative splicing in vertebrates. Mol. Biol. Evol. 28, 2949–2959 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Barash, Y. & Garcia, J. V. Predicting alternative splicing. Methods Mol. Biol. 1126, 411–423 (2014).

    Article  CAS  PubMed  Google Scholar 

  • Nesvizhskii, A. I. Proteogenomics: concepts, applications and computational strategies. Nat. Methods 11, 1114–1125 (2014). An obvious starting point to explore strategies for the analysis of mass-spectrometry data in genomics.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Kim, M. S. et al. A draft map of the human proteome. Nature 509, 575–581 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Wilming, L. G. et al. The vertebrate genome annotation (Vega) database. Nucleic Acids Res. 36, D753–D760 (2008).

    Article  CAS  PubMed  Google Scholar 

  • Ezkurdia, I., Vazquez, J., Valencia, A. & Tress, M. Analyzing the first drafts of the human proteome. J. Proteome Res. (2014).

  • Wright, J. C. et al. Improving GENCODE reference gene annotation using a high-stringency proteogenomics workflow. Nat. Commun. 7, 11778 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Kim, T. K. et al. Widespread transcription at neuronal activity-regulated enhancers. Nature 465, 182–187 (2010).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Ingolia, N. T. Ribosome profiling: new views of translation, from single codons to genome scale. Nat. Rev. Genet. 15, 205–213 (2014).

    Article  CAS  PubMed  Google Scholar 

  • Ingolia, N. T., Ghaemmaghami, S., Newman, J. R. & Weissman, J. S. Genome-wide analysis in vivo of translation with nucleotide resolution using ribosome profiling. Science 324, 218–223 (2009).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Jackson, R. & Standart, N. The awesome power of ribosome profiling. RNA 21, 652–654 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Ingolia, N. T. Ribosome footprint profiling of translation throughout the genome. Cell 165, 22–33 (2016). A primer on the use of RP from one of the key developers of the technique.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Raj, A. et al. Thousands of novel translated open reading frames in humans inferred by ribosome footprint profiling. eLife 5, e13328 (2016).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  • Mumtaz, M. A. & Couso, J. P. Ribosomal profiling adds new coding sequences to the proteome. Biochem. Soc. Trans. 43, 1271–1276 (2015).

    Article  CAS  PubMed  Google Scholar 

  • Graur, D. et al. On the immortality of television sets: “function” in the human genome according to the evolution-free gospel of ENCODE. Genome Biol. Evol. 5, 578–590 (2013).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  • Xie, S. Q. et al. RPFdb: a database for genome wide information of translated mRNA generated from ribosome profiling. Nucleic Acids Res. 44, D254–D258 (2016).

    Article  CAS  PubMed  Google Scholar 

  • Goff, L. A. & Rinn, J. L. Linking RNA biology to lncRNAs. Genome Res. 25, 1456–1465 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Palazzo, A. F. & Lee, E. S. Non-coding RNA: what is functional and what is junk? Front. Genet. 6, 2 (2015).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  • Kutter, C. et al. Rapid turnover of long noncoding RNAs and the evolution of gene expression. PLoS Genet. 8, e1002841 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Ulitsky, I. Evolution to the rescue: using comparative genomics to understand long non-coding RNAs. Nat. Rev. Genet. 17, 601–614 (2016).

    Article  CAS  PubMed  Google Scholar 

  • Sleutels, F., Zwart, R. & Barlow, D. P. The non-coding Air RNA is required for silencing autosomal imprinted genes. Nature 415, 810–813 (2002).

    Article  CAS  PubMed  Google Scholar 

  • Derrien, T. et al. The GENCODE v7 catalog of human long noncoding RNAs: analysis of their gene structure, evolution, and expression. Genome Res. 22, 1775–1789 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Lai, F. & Shiekhattar, R. Enhancer RNAs: the new molecules of transcription. Curr. Opin. Genet. Dev. 25, 38–42 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Scruggs, B. S. et al. Bidirectional transcription arises from two distinct hubs of transcription factor binding and active chromatin. Mol. Cell 58, 1101–1112 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Furio-Tari, P., Tarazona, S., Gabaldon, T. & Enright, A. J. & Conesa, A. spongeScan: A web for detecting microRNA binding elements in lncRNA sequences. Nucleic Acids Res. (2016).

  • Novikova, I. V., Hennelly, S. P. & Sanbonmatsu, K. Y. Tackling structures of long noncoding RNAs. Int. J. Mol. Sci. 14, 23672–23684 (2013).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  • Konig, J., Zarnack, K., Luscombe, N. M. & Ule, J. Protein-RNA interactions: new genomic technologies and perspectives. Nat. Rev. Genet. 13, 77–83 (2011).

    Article  CAS  Google Scholar 

  • Quek, X. C. et al. lncRNAdb v2.0: expanding the reference database for functional long noncoding RNAs. Nucleic Acids Res. 43, D168–D173 (2015).

    Article  CAS  PubMed  Google Scholar 

  • Volders, P. J. et al. An update on LNCipedia: a database for annotated human lncRNA sequences. Nucleic Acids Res. 43, 4363–4364 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Zhao, Y. et al. NONCODE 2016: an informative and valuable data source of long non-coding RNAs. Nucleic Acids Res. 44, D203–D208 (2016).

    Article  CAS  PubMed  Google Scholar 

  • RNAcentral Consortium. RNAcentral: an international database of ncRNA sequences. Nucleic Acids Res. 43, D123–D129 (2015).

  • ENCODE Project Consortium. An integrated encyclopedia of DNA elements in the human genome. Nature 489, 57–74 (2012).

  • Belton, J. M. et al. Hi-C: a comprehensive technique to capture the conformation of genomes. Methods 58, 268–276 (2012).

    Article  CAS  PubMed  Google Scholar 

  • Fullwood, M. J. & Ruan, Y. ChIP-based methods for the identification of long-range chromatin interactions. J. Cell Biochem. 107, 30–39 (2009).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Mifsud, B. et al. Mapping long-range promoter contacts in human cells with high-resolution capture Hi-C. Nat. Genet. 47, 598–606 (2015). This study uses Capture Hi-C to examine the long-range chromosome interactions of 22,000 human promoters.

    Article  CAS  PubMed  Google Scholar 

  • Cairns, J. et al. CHiCAGO: robust detection of DNA looping interactions in capture Hi-C data. Genome Biol. 17, 127 (2016).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  • Dickel, D. E. et al. Function-based identification of mammalian enhancers using site-specific integration. Nat. Methods 11, 566–571 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Heinz, S., Romanoski, C. E., Benner, C. & Glass, C. K. The selection and function of cell type-specific enhancers. Nat. Rev. Mol. Cell Biol. 16, 144–154 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Shlyueva, D., Stampfel, G. & Stark, A. Transcriptional enhancers: from properties to genome-wide predictions. Nat. Rev. Genet. 15, 272–286 (2014).

    Article  CAS  PubMed  Google Scholar 

  • Zerbino, D. R. et al. Ensembl regulation resources. Database (Oxford) 2016, 1–13 (2016).

    Article  CAS  Google Scholar 

  • de Wit, E. et al. CTCF binding polarity determines chromatin looping. Mol. Cell 60, 676–684 (2015).

    Article  CAS  PubMed  Google Scholar 

  • Ong, C. T. & Corces, V. G. CTCF: an architectural protein bridging genome topology and function. Nat. Rev. Genet. 15, 234–246 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Gonzalez-Porta, M., Frankish, A., Rung, J., Harrow, J. & Brazma, A. Transcriptome analysis of human tissues and cell lines reveals one dominant transcript per gene. Genome Biol. 14, R70 (2013).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  • The GTEx Consortium. Human genomics. The Genotype-Tissue Expression (GTEx) pilot analysis: multitissue gene regulation in humans. Science 348, 648–660 (2015).

  • Uhlen, M. et al. Proteomics. Tissue-based map of the human proteome. Science 347, 1260419 (2015).

    Article  CAS  PubMed  Google Scholar 

  • Vogel, C. & Marcotte, E. M. Insights into the regulation of protein abundance from proteomic and transcriptomic analyses. Nat. Rev. Genet. 13, 227–232 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Battle, A. et al. Genomic variation. Impact of regulatory variation from RNA to protein. Science 347, 664–667 (2015).

    Article  CAS  PubMed  Google Scholar 

  • McLaren, W. et al. The Ensembl Variant Effect Predictor. Genome Biol. 17, 122 (2016).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  • Dalgleish, R. et al. Locus Reference Genomic sequences: an improved basis for describing human DNA variants. Genome Med. 2, 24 (2010). This project provides insights into the relationship between gene annotation and the description of variation in the clinic.

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  • Takahashi, H., Kato, S., Murata, M. & Carninci, P. CAGE (cap analysis of gene expression): a protocol for the detection of promoter and transcriptional networks. Methods Mol. Biol. 786, 181–200 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Batut, P., Dobin, A., Plessy, C., Carninci, P. & Gingeras, T. R. High-fidelity promoter profiling reveals widespread alternative promoter usage and transposon-driven developmental gene expression. Genome Res. 23, 169–180 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Fullwood, M. J. et al. An oestrogen-receptor-α-bound human chromatin interactome. Nature 462, 58–64 (2009).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Johnson, D. S., Mortazavi, A., Myers, R. M. & Wold, B. Genome-wide mapping of in vivo protein-DNA interactions. Science 316, 1497–1502 (2007).

    Article  CAS  PubMed  Google Scholar 

  • Rinn, J. L. et al. Functional demarcation of active and silent chromatin domains in human HOX loci by noncoding RNAs. Cell 129, 1311–1323 (2007).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Carver, T., Harris, S. R., Berriman, M., Parkhill, J. & McQuillan, J. A. Artemis: an integrated platform for visualization and analysis of high-throughput sequence-based experimental data. Bioinformatics 28, 464–469 (2012).

    Article  CAS  PubMed  Google Scholar 

  • Thorvaldsdottir, H., Robinson, J. T. & Mesirov, J. P. Integrative Genomics Viewer (IGV): high-performance genomics data visualization and exploration. Brief Bioinform. 14, 178–192 (2013).

    Article  CAS  PubMed  Google Scholar 

  • Benson, D. A. et al. GenBank. Nucleic Acids Res. 41, D36–D42 (2013).

    Article  CAS  PubMed  Google Scholar 

  • Dobin, A. et al. STAR: ultrafast universal RNA-seq aligner. Bioinformatics 29, 15–21 (2013).

    Article  CAS  PubMed  Google Scholar 

  • Trapnell, C., Pachter, L. & Salzberg, S. L. TopHat: discovering splice junctions with RNA-Seq. Bioinformatics 25, 1105–1111 (2009).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Lin, M. F., Jungreis, I. & Kellis, M. PhyloCSF: a comparative genomics method to distinguish protein coding and non-coding regions. Bioinformatics 27, i275–i282 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Nawrocki, E. P. et al. Rfam 12.0: updates to the RNA families database. Nucleic Acids Res. 43, D130–D137 (2015).

    Article  CAS  PubMed  Google Scholar 

  • Kozomara, A. & Griffiths-Jones, S. miRBase: annotating high confidence microRNAs using deep sequencing data. Nucleic Acids Res. 42, D68–D73 (2014).

    Article  CAS  PubMed  Google Scholar 

  • Yates, A. et al. Ensembl 2016. Nucleic Acids Res. 44, D710–D716 (2016).

    Article  CAS  PubMed  Google Scholar