Quantifying similarity between motifs - PubMed
- ️Tue Nov 17 2105
Quantifying similarity between motifs
Shobhit Gupta et al. Genome Biol. 2007.
Abstract
A common question within the context of de novo motif discovery is whether a newly discovered, putative motif resembles any previously discovered motif in an existing database. To answer this question, we define a statistical measure of motif-motif similarity, and we describe an algorithm, called Tomtom, for searching a database of motifs with a given query motif. Experimental simulations demonstrate the accuracy of Tomtom's E values and its effectiveness in finding similar motifs.
Figures

An aligned pair of similar motifs. The query and target motifs are both derived from JASPAR motif NF-Y, following the simulation protocol described in the text. Tomtom assigns an E value of 3.81 × e-10 to this particular match. The figure was created using a version of seqlogo [26], modified to display aligned pairs of Logos.

Score distribution histogram for a query motif of length 12. The figure contains 12 histograms overlaid on top of each other. Each histogram corresponds to the frequency distribution of scores, for an offset of zero relative to a query motif of width 12. The first (red) histogram is for the alignment involving only the first query column, the next (light green) histogram relates to the first two query columns, and so on.

Accuracy of motif comparison P values. The figure plots the computed motif P value as a function of the empirical (rank-based) P value from searching shuffled query motifs against shuffled target motifs. The central line corresponds to y = x, and the two adjacent dotted lines correspond to y = 0.5x and y = 2x. The P values are computed using the euclidean distance.

Measuring retrieval accuracy. Motif retrieval accuracy is estimated using simulated JASPAR motifs, as described in the text. The figure plots the percentage of correct query-target pairs (true positives) as a function of the percentage of incorrect pairs (false positives) as we traverse the list of query-target pairs sorted by Tomtom P value or any of the other three methods of combining column-wise scores. The solid and dashed lines correspond to width-normalized scores scores (P values, arithmetic mean, and geometric mean), and the green dotted line represents sum of column scores. This figure is for euclidean distance (ED) at a sampling rate of S/8.

E value based retrieval rate. The figure plots the percentage of query motifs that successfully matched the correct JASPAR target as a function of the number of sites used to create the query motif. Here 'success' means that the top-ranked motif is the correct target and has an E value less than 0.01. ALLR, average log-likelihood ratio; ED, euclidean distance; FIET, Fisher-Irwin exact test; KLD, Kullback-Leibler divergence; PCC, Pearson correlation coefficient; PCST, Pearson χ2 test; SW, Sandelin-Wasserman function.
Similar articles
-
Li L. Li L. J Comput Biol. 2009 Feb;16(2):317-29. doi: 10.1089/cmb.2008.16TT. J Comput Biol. 2009. PMID: 19193149 Free PMC article.
-
Bailey TL. Bailey TL. Methods Mol Biol. 2008;452:231-51. doi: 10.1007/978-1-60327-159-2_12. Methods Mol Biol. 2008. PMID: 18566768 Review.
-
MODSIDE: a motif discovery pipeline and similarity detector.
Tran NTL, Huang CH. Tran NTL, et al. BMC Genomics. 2018 Oct 19;19(1):755. doi: 10.1186/s12864-018-5148-1. BMC Genomics. 2018. PMID: 30340511 Free PMC article.
-
The value of position-specific priors in motif discovery using MEME.
Bailey TL, Bodén M, Whitington T, Machanick P. Bailey TL, et al. BMC Bioinformatics. 2010 Apr 9;11:179. doi: 10.1186/1471-2105-11-179. BMC Bioinformatics. 2010. PMID: 20380693 Free PMC article.
-
Computational identification and analysis of protein short linear motifs.
Davey NE, Edwards RJ, Shields DC. Davey NE, et al. Front Biosci (Landmark Ed). 2010 Jun 1;15(3):801-25. doi: 10.2741/3647. Front Biosci (Landmark Ed). 2010. PMID: 20515727 Review.
Cited by
-
Zhang Y, He Y, Zheng G, Wei C. Zhang Y, et al. BMC Genomics. 2015;16 Suppl 7(Suppl 7):S13. doi: 10.1186/1471-2164-16-S7-S13. Epub 2015 Jun 11. BMC Genomics. 2015. PMID: 26099518 Free PMC article.
-
Deep learning of immune cell differentiation.
Maslova A, Ramirez RN, Ma K, Schmutz H, Wang C, Fox C, Ng B, Benoist C, Mostafavi S; Immunological Genome Project. Maslova A, et al. Proc Natl Acad Sci U S A. 2020 Oct 13;117(41):25655-25666. doi: 10.1073/pnas.2011795117. Epub 2020 Sep 25. Proc Natl Acad Sci U S A. 2020. PMID: 32978299 Free PMC article.
-
Farhangi S, Gòdia M, Derks MFL, Harlizius B, Dibbits B, González-Prendes R, Crooijmans RPMA, Madsen O, Groenen MAM. Farhangi S, et al. BMC Genomics. 2024 Jul 11;25(1):684. doi: 10.1186/s12864-024-10583-w. BMC Genomics. 2024. PMID: 38992576 Free PMC article.
-
Neuregulin 1-HER axis as a key mediator of hyperglycemic memory effects in breast cancer.
Park J, Sarode VR, Euhus D, Kittler R, Scherer PE. Park J, et al. Proc Natl Acad Sci U S A. 2012 Dec 18;109(51):21058-63. doi: 10.1073/pnas.1214400109. Epub 2012 Dec 3. Proc Natl Acad Sci U S A. 2012. PMID: 23213231 Free PMC article.
-
Characterizing the role of exosomal miRNAs in metastasis.
Agrawal P, Olgun G, Singh A, Gopalan V, Hannenhalli S. Agrawal P, et al. bioRxiv [Preprint]. 2024 Aug 21:2024.08.20.608894. doi: 10.1101/2024.08.20.608894. bioRxiv. 2024. PMID: 39372783 Free PMC article. Updated. Preprint.
References
MeSH terms
LinkOut - more resources
Full Text Sources
Other Literature Sources