Combining results of multiple search engines in proteomics - PubMed
Review
Combining results of multiple search engines in proteomics
David Shteynberg et al. Mol Cell Proteomics. 2013 Sep.
Abstract
A crucial component of the analysis of shotgun proteomics datasets is the search engine, an algorithm that attempts to identify the peptide sequence from the parent molecular ion that produced each fragment ion spectrum in the dataset. There are many different search engines, both commercial and open source, each employing a somewhat different technique for spectrum identification. The set of high-scoring peptide-spectrum matches for a defined set of input spectra differs markedly among the various search engine results; individual engines each provide unique correct identifications among a core set of correlative identifications. This has led to the approach of combining the results from multiple search engines to achieve improved analysis of each dataset. Here we review the techniques and available software for combining the results of multiple search engines and briefly compare the relative performance of these techniques.
Figures

Comparison of the number of PSMs for all combinations of one through six search engines combined with iProphet. Each bar represents the correct PSMs that can be recovered from a given dataset using the computed scores to rank and filter the results while requiring decoy-estimated error rates of 0%, 0.5%, 1.0%, 1.5%, and 2.0%. Also shown are MSblender results combining InSpecT, MyriMatch, OMSSA, SEQUEST, and X!Tandem, as well as PepArML results combining Mascot, X!Tandem, X!Tandem + K-score, MyriMatch, OMSSA, SScore, and InSpecT.

Scatter plot showing the combined number of correctly identified PSMs at a 1% error rate plotted against the complementarity scores of all pairs of search engine combinations. Combining the two most complementary search engines does not necessarily lead to the best overall result, and combining the two least complementary search engines does not necessarily lead to the worst overall result.

Scatter plot showing the percentage of PSMs identified uniquely by a single search engine, out of the total PSMs identified by a single search engine, relative to the combination of the other five search engines. The x-coordinate shows the total number of PSMs identified by a single search engine analyzed with the TPP.

Comparison of the distinct peptide sequence identifications for three search engine combiners (MSblender, PepArML, and iProphet) at decoy-estimated error rates of 0%, 0.5%, 1.0%, 1.5%, and 2.0%. These error rates are denoted by the bottom whisker, bottom of the box, center line, top of the box, and top whisker, respectively. Also depicted on the right are the results of a spectral library search with SpectraST using the latest NIST spectral library, as well as the result when iProphet was used to combine SpectraST results with the combined sequence search engine results. Data points were inferred using the spline function, as not all of the tools allowed filtering at error rates below 1% as estimated by the decoys.

Receiver operating characteristic curves on the distinct peptide sequence level showing performance of iProphet on single search engine results versus the combination of all search engines. The spectral library search results are also included for comparison.
Similar articles
-
Audain E, Uszkoreit J, Sachsenberg T, Pfeuffer J, Liang X, Hermjakob H, Sanchez A, Eisenacher M, Reinert K, Tabb DL, Kohlbacher O, Perez-Riverol Y. Audain E, et al. J Proteomics. 2017 Jan 6;150:170-182. doi: 10.1016/j.jprot.2016.08.002. Epub 2016 Aug 4. J Proteomics. 2017. PMID: 27498275
-
Kwon T, Choi H, Vogel C, Nesvizhskii AI, Marcotte EM. Kwon T, et al. J Proteome Res. 2011 Jul 1;10(7):2949-58. doi: 10.1021/pr2002116. Epub 2011 Apr 29. J Proteome Res. 2011. PMID: 21488652 Free PMC article.
-
Amir SH, Yuswan MH, Aizat WM, Mansor MK, Desa MNM, Yusof YA, Song LK, Mustafa S. Amir SH, et al. J Proteomics. 2021 Jun 15;241:104240. doi: 10.1016/j.jprot.2021.104240. Epub 2021 Apr 21. J Proteomics. 2021. PMID: 33894373
-
Database Search Engines: Paradigms, Challenges and Solutions.
Verheggen K, Martens L, Berven FS, Barsnes H, Vaudel M. Verheggen K, et al. Adv Exp Med Biol. 2016;919:147-156. doi: 10.1007/978-3-319-41448-5_6. Adv Exp Med Biol. 2016. PMID: 27975215 Review.
-
Kalhor M, Lapin J, Picciani M, Wilhelm M. Kalhor M, et al. Mol Cell Proteomics. 2024 Jul;23(7):100798. doi: 10.1016/j.mcpro.2024.100798. Epub 2024 Jun 11. Mol Cell Proteomics. 2024. PMID: 38871251 Free PMC article. Review.
Cited by
-
Peptidomics for the discovery and characterization of neuropeptides and hormones.
Romanova EV, Sweedler JV. Romanova EV, et al. Trends Pharmacol Sci. 2015 Sep;36(9):579-86. doi: 10.1016/j.tips.2015.05.009. Epub 2015 Jul 1. Trends Pharmacol Sci. 2015. PMID: 26143240 Free PMC article. Review.
-
Building high-quality assay libraries for targeted analysis of SWATH MS data.
Schubert OT, Gillet LC, Collins BC, Navarro P, Rosenberger G, Wolski WE, Lam H, Amodei D, Mallick P, MacLean B, Aebersold R. Schubert OT, et al. Nat Protoc. 2015 Mar;10(3):426-41. doi: 10.1038/nprot.2015.015. Epub 2015 Feb 12. Nat Protoc. 2015. PMID: 25675208
-
The 5300-year-old Helicobacter pylori genome of the Iceman.
Maixner F, Krause-Kyora B, Turaev D, Herbig A, Hoopmann MR, Hallows JL, Kusebauch U, Vigl EE, Malfertheiner P, Megraud F, O'Sullivan N, Cipollini G, Coia V, Samadelli M, Engstrand L, Linz B, Moritz RL, Grimm R, Krause J, Nebel A, Moodley Y, Rattei T, Zink A. Maixner F, et al. Science. 2016 Jan 8;351(6269):162-165. doi: 10.1126/science.aad2545. Science. 2016. PMID: 26744403 Free PMC article.
-
Bioinformatic Workflows for Metaproteomics.
Holstein T, Muth T. Holstein T, et al. Methods Mol Biol. 2024;2820:187-213. doi: 10.1007/978-1-0716-3910-8_16. Methods Mol Biol. 2024. PMID: 38941024
-
Ashraf S, Parrine D, Bilal M, Chaudhry U, Lefsrud M, Zhao X. Ashraf S, et al. Antibiotics (Basel). 2022 Jun 2;11(6):759. doi: 10.3390/antibiotics11060759. Antibiotics (Basel). 2022. PMID: 35740165 Free PMC article.
References
-
- Aebersold R., Mann M. (2003) Mass spectrometry-based proteomics. Nature 422, 198–207 - PubMed
-
- Deutsch E. W., Lam H., Aebersold R. (2008) Data analysis and bioinformatics tools for tandem mass spectrometry in proteomics. Physiol. Genomics 33, 18–25 - PubMed
-
- Craig R., Beavis R. C. (2004) TANDEM: matching proteins with tandem mass spectra. Bioinformatics 20, 1466–1467 - PubMed
Publication types
MeSH terms
Substances
Grants and funding
LinkOut - more resources
Full Text Sources
Other Literature Sources