The difficulty of identifying genes in anonymous vertebrate sequences - PubMed
The difficulty of identifying genes in anonymous vertebrate sequences
J M Claverie et al. Comput Chem. 1997.
Abstract
The identification of genes in newly determined vertebrate genomic sequences can range from a trivial to an impossible task. In a statistical preamble, we show how "insignificant" are the individual features on which gene identification can be rigorously based: promoter signals, splice sites, open reading frames, etc. The practical identification of genes is thus ultimately a tributary of their resemblance to those already present in sequence databases, or incorporated into training sets. The inherent conservatism of the currently popular methods (database similarity search, GRAIL) will greatly limit our capacity for making unexpected biological discoveries from increasingly abundant genomic data. Beyond a very limited subset of trivial cases, the automated interpretation (i.e. without experimental validation) of genomic data, is still a myth. On the other hand, characterizing the 60,000 to 100,000 genes thought to be hidden in the human genome by the mean of individual experiments is not feasible. Thus, it appears that our only hope of turning genome data into genome information must rely on drastic progresses in the way we identify and analyse genes in silico.
Similar articles
-
Computational methods for the identification of genes in vertebrate genomic sequences.
Claverie JM. Claverie JM. Hum Mol Genet. 1997;6(10):1735-44. doi: 10.1093/hmg/6.10.1735. Hum Mol Genet. 1997. PMID: 9300666 Review.
-
Zendman AJ, Zschocke J, van Kraats AA, de Wit NJ, Kurpisz M, Weidle UH, Ruiter DJ, Weiss EH, van Muijen GN. Zendman AJ, et al. Gene. 2003 May 8;309(2):125-33. doi: 10.1016/s0378-1119(03)00497-9. Gene. 2003. PMID: 12758128
-
Computational methods for exon detection.
Claverie JM. Claverie JM. Mol Biotechnol. 1998 Aug;10(1):27-48. doi: 10.1007/BF02745861. Mol Biotechnol. 1998. PMID: 9779421 Review.
-
Grimm L, Holinski-Feder E, Teodoridis J, Scheffer B, Schindelhauer D, Meitinger T, Ueffing M. Grimm L, et al. Hum Mol Genet. 1998 Nov;7(12):1873-86. doi: 10.1093/hmg/7.12.1873. Hum Mol Genet. 1998. PMID: 9811930
Cited by
-
Brunet MA, Levesque SA, Hunting DJ, Cohen AA, Roucou X. Brunet MA, et al. Genome Res. 2018 May;28(5):609-624. doi: 10.1101/gr.230938.117. Epub 2018 Apr 6. Genome Res. 2018. PMID: 29626081 Free PMC article. Review.
-
Self-identification of protein-coding regions in microbial genomes.
Audic S, Claverie JM. Audic S, et al. Proc Natl Acad Sci U S A. 1998 Aug 18;95(17):10026-31. doi: 10.1073/pnas.95.17.10026. Proc Natl Acad Sci U S A. 1998. PMID: 9707594 Free PMC article.
-
Goodswen SJ, Kennedy PJ, Ellis JT. Goodswen SJ, et al. PLoS One. 2012;7(11):e50609. doi: 10.1371/journal.pone.0050609. Epub 2012 Nov 30. PLoS One. 2012. PMID: 23226328 Free PMC article.
-
DNA-energetics-based analyses suggest additional genes in prokaryotes.
Khandelwal G, Gupta J, Jayaram B. Khandelwal G, et al. J Biosci. 2012 Jul;37(3):433-44. doi: 10.1007/s12038-012-9221-7. J Biosci. 2012. PMID: 22750981
-
Wesp V, Theißen G, Schuster S. Wesp V, et al. Sci Rep. 2023 Dec 27;13(1):22996. doi: 10.1038/s41598-023-49626-9. Sci Rep. 2023. PMID: 38151539 Free PMC article.
MeSH terms
Substances
LinkOut - more resources
Full Text Sources
Medical