Génie: literature-based gene prioritization at multi genomic scale - PubMed
. 2011 Jul;39(Web Server issue):W455-61.
doi: 10.1093/nar/gkr246. Epub 2011 May 23.
Affiliations
- PMID: 21609954
- PMCID: PMC3125729
- DOI: 10.1093/nar/gkr246
Génie: literature-based gene prioritization at multi genomic scale
Jean-Fred Fontaine et al. Nucleic Acids Res. 2011 Jul.
Abstract
Biomedical literature is traditionally used as a way to inform scientists of the relevance of genes in relation to a research topic. However many genes, especially from poorly studied organisms, are not discussed in the literature. Moreover, a manual and comprehensive summarization of the literature attached to the genes of an organism is in general impossible due to the high number of genes and abstracts involved. We introduce the novel Génie algorithm that overcomes these problems by evaluating the literature attached to all genes in a genome and to their orthologs according to a selected topic. Génie showed high precision (up to 100%) and the best performance in comparison to other algorithms in most of the benchmarks, especially when high sensitivity was required. Moreover, the prioritization of zebrafish genes involved in heart development, using human and mouse orthologs, showed high enrichment in differentially expressed genes from microarray experiments. The Génie web server supports hundreds of species, millions of genes and offers novel functionalities. Common run times below a minute, even when analyzing the human genome with hundreds of thousands of literature records, allows the use of Génie in routine lab work.
Availability: http://cbdm.mdc-berlin.de/tools/genie/.
Figures

Flow chart of the Génie web tool and algorithm. As an example, a user could query human genes related to a disease or a molecular pathway using chicken and rat orthologs. Usage of orthology information is optional. Data are extracted from four NCBI databases: Taxonomy, Gene, MEDLINE and HomoloGene. As the retrieved literature associated to the topic may not be complete, it is used to train a text mining classifier that will select relevant gene literature. The output gene list (human genes in the given example) is ranked using Fisher’s statistics.

Benchmarks. (a) Génie confidence scores versus log2-fold expression changes for all up-regulated probes (at least 2-fold expression change) in a zebrafish microarray data set between hearts from 3-day-old zebrafish embryos and whole body tissue. All probes with a positive confidence score were selected by Génie using orthology to zebrafish, mice and humans (red diamonds and black crosses). Probes also selected by Génie using only zebrafish-related abstracts are plotted with black crosses. Genes not selected by Génie have a score equal to zero (blue circles). The scores and gene expression fold changes for each gene are available as
Supplementary Table S6. (b) Precision when predicting differentially expressed genes using gene ranks given by Génie. From the zebrafish microarray data analysis, differentially expressed genes are selected by a FDR < 0.01 and a minimum 2-fold expression change between heart and body samples. (c) These precision–recall plots show the performance of Génie (red curves), Fable (blue curves) and PolySearch (black curves) when ranking genes from eight randomly chosen KEGG pathways. The three tools were used with default parameters. PolySearch returned no results for two pathways: drug metabolism cytochrome P450 and fructose mannose metabolism (see
Supplementary Methods). Génie was run without using orthology expansion of the literature.
Similar articles
-
MILANO--custom annotation of microarray results using automatic literature searches.
Rubinstein R, Simon I. Rubinstein R, et al. BMC Bioinformatics. 2005 Jan 20;6:12. doi: 10.1186/1471-2105-6-12. BMC Bioinformatics. 2005. PMID: 15661078 Free PMC article.
-
MedlineRanker: flexible ranking of biomedical literature.
Fontaine JF, Barbosa-Silva A, Schaefer M, Huska MR, Muro EM, Andrade-Navarro MA. Fontaine JF, et al. Nucleic Acids Res. 2009 Jul;37(Web Server issue):W141-6. doi: 10.1093/nar/gkp353. Epub 2009 May 8. Nucleic Acids Res. 2009. PMID: 19429696 Free PMC article.
-
Warde-Farley D, Donaldson SL, Comes O, Zuberi K, Badrawi R, Chao P, Franz M, Grouios C, Kazi F, Lopes CT, Maitland A, Mostafavi S, Montojo J, Shao Q, Wright G, Bader GD, Morris Q. Warde-Farley D, et al. Nucleic Acids Res. 2010 Jul;38(Web Server issue):W214-20. doi: 10.1093/nar/gkq537. Nucleic Acids Res. 2010. PMID: 20576703 Free PMC article.
-
Candidate gene prioritization.
Masoudi-Nejad A, Meshkin A, Haji-Eghrari B, Bidkhori G. Masoudi-Nejad A, et al. Mol Genet Genomics. 2012 Sep;287(9):679-98. doi: 10.1007/s00438-012-0710-z. Epub 2012 Aug 15. Mol Genet Genomics. 2012. PMID: 22893106 Retracted. Review.
-
Functional genomics tools for the analysis of zebrafish pigment.
Pickart MA, Sivasubbu S, Nielsen AL, Shriram S, King RA, Ekker SC. Pickart MA, et al. Pigment Cell Res. 2004 Oct;17(5):461-70. doi: 10.1111/j.1600-0749.2004.00189.x. Pigment Cell Res. 2004. PMID: 15357832 Review.
Cited by
-
Paredes-Sánchez FA, Sifuentes-Rincón AM, Segura Cabrera A, García Pérez CA, Parra Bracamonte GM, Ambriz Morales P. Paredes-Sánchez FA, et al. BMC Genet. 2015 Jul 22;16:91. doi: 10.1186/s12863-015-0247-3. BMC Genet. 2015. PMID: 26198337 Free PMC article.
-
Liu Y, Liang Y, Wishart D. Liu Y, et al. Nucleic Acids Res. 2015 Jul 1;43(W1):W535-42. doi: 10.1093/nar/gkv383. Epub 2015 Apr 29. Nucleic Acids Res. 2015. PMID: 25925572 Free PMC article.
-
Cheung WA, Ouellette BF, Wasserman WW. Cheung WA, et al. Genome Med. 2012 Sep 28;4(9):75. doi: 10.1186/gm376. eCollection 2012. Genome Med. 2012. PMID: 23021552 Free PMC article.
-
Performance Assessment of the Network Reconstruction Approaches on Various Interactomes.
Arici MK, Tuncbag N. Arici MK, et al. Front Mol Biosci. 2021 Oct 5;8:666705. doi: 10.3389/fmolb.2021.666705. eCollection 2021. Front Mol Biosci. 2021. PMID: 34676243 Free PMC article.
-
Watford SM, Grashow RG, De La Rosa VY, Rudel RA, Friedman KP, Martin MT. Watford SM, et al. Comput Toxicol. 2018 Aug;7:46-57. doi: 10.1016/j.comtox.2018.06.003. Epub 2018 Jun 19. Comput Toxicol. 2018. PMID: 32274464 Free PMC article.
References
-
- Collins FS, McKusick VA. Implications of the human genome project for medical science. JAMA. 2001;285:540–544. - PubMed
-
- Marcotte E, Date S. Exploiting big biology: integrating large-scale biological data for function inference. Brief Bioinform. 2001;2:363–374. - PubMed
-
- Koonin EV. Orthologs, paralogs, and evolutionary genomics. Annu. Rev. Genet. 2005;39:309–338. - PubMed
-
- Andrade MA, Bork P. Automated extraction of information in molecular biology. FEBS Lett. 2000;476:12–17. - PubMed
Publication types
MeSH terms
LinkOut - more resources
Full Text Sources
Medical
Molecular Biology Databases
Miscellaneous