pubmed.ncbi.nlm.nih.gov

Comparison of Real Frequencies of Strings vs. the Expected Ones Reveals the Information Capacity of Macromoleculae - PubMed

Comparison of Real Frequencies of Strings vs. the Expected Ones Reveals the Information Capacity of Macromoleculae

Michael G Sadovsky. J Biol Phys. 2003 Mar.

Abstract

The information capacity of nucleotide sequences is defined through the calculation of specific entropy of their frequency dictionary. The specificentropy of the frequency dictionary is calculated against the reconstructeddictionary; this latter bears the most probable continuations of the shorterstrings. This developed measure allows to distinguish the sequences both from the randons ones, and from those with high level of (rather simple) order. Some implications of the developed methodology in the fields of genetics,bioinformatics, and molecular biology are discussed.

Keywords: Markov model; dictionary; entropy; information capacity; ordered sequence; random sequence; reconstructed dictionary; specific entropy.

PubMed Disclaimer

Similar articles

Cited by

References

    1. Waterman M.S., editor. Mathematical Methods for DNA Sequences. Boca Raton: CRC Press; 1998.
    1. Alexandrov A.A., Alexandrov N.N., Borodovsky M., Kalambet Y., Kister A.Z., Mironov A.A., Pevzner P.A., Shepelev V.A. Computer Analysis of Genetic Texts. Moscow: Nauka; 1990.
    1. Claverie, J.-M., Sauvaget, I. and Bougueleret, L.: k-Tuple Frequency Analysis: From Intron/ Exon Discrimination to T-Cell Epitope Mapping, In: R.F. Doolittle (ed.), Molecular Evolution: Computer Analysis of Protein and Nucleic Acid Sequences Meth. Enzymol. vol. 183, 1990, pp. 252-281. - PubMed
    1. Karlin S., Cardon L.R. Computational DNA Sequence Analysis. Ann. Rev. Microbiol. 1994;48:619–654. - PubMed
    1. Yockey H.P. Information Theory and Molecular Biology. N.Y.: Cambridge Univ. Press; 1992.