Protein contact prediction using metagenome sequence data and residual neural networks - PubMed
- ️Wed Jan 01 2020
Protein contact prediction using metagenome sequence data and residual neural networks
Qi Wu et al. Bioinformatics. 2020.
Abstract
Motivation: Almost all protein residue contact prediction methods rely on the availability of deep multiple sequence alignments (MSAs). However, many proteins from the poorly populated families do not have sufficient number of homologs in the conventional UniProt database. Here we aim to solve this issue by exploring the rich sequence data from the metagenome sequencing projects.
Results: Based on the improved MSA constructed from the metagenome sequence data, we developed MapPred, a new deep learning-based contact prediction method. MapPred consists of two component methods, DeepMSA and DeepMeta, both trained with the residual neural networks. DeepMSA was inspired by the recent method DeepCov, which was trained on 441 matrices of covariance features. By considering the symmetry of contact map, we reduced the number of matrices to 231, which makes the training more efficient in DeepMSA. Experiments show that DeepMSA outperforms DeepCov by 10-13% in precision. DeepMeta works by combining predicted contacts and other sequence profile features. Experiments on three benchmark datasets suggest that the contribution from the metagenome sequence data is significant with P-values less than 4.04E-17. MapPred is shown to be complementary and comparable the state-of-the-art methods. The success of MapPred is attributed to three factors: the deeper MSA from the metagenome sequence data, improved feature design in DeepMSA and optimized training by the residual neural networks.
Availability and implementation: http://yanglab.nankai.edu.cn/mappred/.
Supplementary information: Supplementary data are available at Bioinformatics online.
© The Author(s) 2019. Published by Oxford University Press. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.
Figures

The architecture of the proposed methods. (A) is the flowchart of MapPred for contact map prediction. (B) is the MSA generator for constructing an MSA for a protein sequence. (C) is structure for the network used in the method DeepMSA. (D) is the structure for each residual block

The comparison of DeepMSA with DeepCov on the three benchmark datasets

Head-to-head comparisons between the precisions (%) of DeepMSA and DeepCov on three benchmark datasets

The precisions of DeepMSA on the benchmark datasets with MSAs enriched with metagenomic sequences at varying bit score thresholds

The precisions of CCMpred and DeepMSA on the benchmark datasets with MSAs generated from the Uniclust30 and the MetaDB databases
Similar articles
-
Zhang C, Zheng W, Mortuza SM, Li Y, Zhang Y. Zhang C, et al. Bioinformatics. 2020 Apr 1;36(7):2105-2112. doi: 10.1093/bioinformatics/btz863. Bioinformatics. 2020. PMID: 31738385 Free PMC article.
-
Jones DT, Kandathil SM. Jones DT, et al. Bioinformatics. 2018 Oct 1;34(19):3308-3315. doi: 10.1093/bioinformatics/bty341. Bioinformatics. 2018. PMID: 29718112 Free PMC article.
-
Li Y, Zhang C, Bell EW, Yu DJ, Zhang Y. Li Y, et al. Proteins. 2019 Dec;87(12):1082-1091. doi: 10.1002/prot.25798. Epub 2019 Aug 22. Proteins. 2019. PMID: 31407406 Free PMC article.
-
Petabase-Scale Homology Search for Structure Prediction.
Lee S, Kim G, Karin EL, Mirdita M, Park S, Chikhi R, Babaian A, Kryshtafovych A, Steinegger M. Lee S, et al. Cold Spring Harb Perspect Biol. 2024 May 2;16(5):a041465. doi: 10.1101/cshperspect.a041465. Cold Spring Harb Perspect Biol. 2024. PMID: 38316555 Review.
-
Ou YY, Ho QT, Chang HT. Ou YY, et al. Proteomics. 2023 Dec;23(23-24):e2200494. doi: 10.1002/pmic.202200494. Epub 2023 Oct 20. Proteomics. 2023. PMID: 37863817 Review.
Cited by
-
Deep Learning-Based Advances in Protein Structure Prediction.
Pakhrin SC, Shrestha B, Adhikari B, Kc DB. Pakhrin SC, et al. Int J Mol Sci. 2021 May 24;22(11):5553. doi: 10.3390/ijms22115553. Int J Mol Sci. 2021. PMID: 34074028 Free PMC article. Review.
-
A fully open-source framework for deep learning protein real-valued distances.
Adhikari B. Adhikari B. Sci Rep. 2020 Aug 7;10(1):13374. doi: 10.1038/s41598-020-70181-0. Sci Rep. 2020. PMID: 32770096 Free PMC article.
-
Accurate prediction of inter-protein residue-residue contacts for homo-oligomeric protein complexes.
Yan Y, Huang SY. Yan Y, et al. Brief Bioinform. 2021 Sep 2;22(5):bbab038. doi: 10.1093/bib/bbab038. Brief Bioinform. 2021. PMID: 33693482 Free PMC article.
-
Structural basis of lipopolysaccharide maturation by the O-antigen ligase.
Ashraf KU, Nygaard R, Vickery ON, Erramilli SK, Herrera CM, McConville TH, Petrou VI, Giacometti SI, Dufrisne MB, Nosol K, Zinkle AP, Graham CLB, Loukeris M, Kloss B, Skorupinska-Tudek K, Swiezewska E, Roper DI, Clarke OB, Uhlemann AC, Kossiakoff AA, Trent MS, Stansfeld PJ, Mancia F. Ashraf KU, et al. Nature. 2022 Apr;604(7905):371-376. doi: 10.1038/s41586-022-04555-x. Epub 2022 Apr 6. Nature. 2022. PMID: 35388216 Free PMC article.
-
Residue-Residue Contact Can Be a Potential Feature for the Prediction of Lysine Crotonylation Sites.
Wang R, Wang Z, Li Z, Lee TY. Wang R, et al. Front Genet. 2022 Jan 4;12:788467. doi: 10.3389/fgene.2021.788467. eCollection 2021. Front Genet. 2022. PMID: 35058968 Free PMC article.
References
-
- Ekeberg M. et al. (2013) Improved contact prediction in proteins: using pseudolikelihoods to infer Potts models. Phys. Rev. E Stat. Nonlinear Soft Matter Phys., 87, 012707. - PubMed
-
- Göbel U. et al. (1994) Correlated mutations and residue contacts in proteins. Proteins Struct. Funct. Bioinform., 18, 309–317. - PubMed