DeepLoc 2.0: multi-label subcellular localization prediction using protein language models - PubMed
- ️Sat Jan 01 2022
DeepLoc 2.0: multi-label subcellular localization prediction using protein language models
Vineet Thumuluri et al. Nucleic Acids Res. 2022.
Abstract
The prediction of protein subcellular localization is of great relevance for proteomics research. Here, we propose an update to the popular tool DeepLoc with multi-localization prediction and improvements in both performance and interpretability. For training and validation, we curate eukaryotic and human multi-location protein datasets with stringent homology partitioning and enriched with sorting signal information compiled from the literature. We achieve state-of-the-art performance in DeepLoc 2.0 by using a pre-trained protein language model. It has the further advantage that it uses sequence input rather than relying on slower protein profiles. We provide two means of better interpretability: an attention output along the sequence and highly accurate prediction of nine different types of protein sorting signals. We find that the attention output correlates well with the position of sorting signals. The webserver is available at services.healthtech.dtu.dk/service.php?DeepLoc-2.0.
© The Author(s) 2022. Published by Oxford University Press on behalf of Nucleic Acids Research.
Figures

DeepLoc 2.0 uses a transformer-based protein language model to predict multi-label subcellular localization and provides interpretability via the attention and sorting signal prediction.

An example snippet from the results page on the webserver. The prediction summary is available for download as a comma-separated file (CSV) at the top which consists of the predicted subcellular localization and sorting signals. The image or attention values of each plot can be separately downloaded. All the predicted subcellular localization and sorting signal labels are listed, along with the prediction score table. The predicted localizations in the table are highlighted in green. If no score crosses the threshold, the label closest to the threshold is chosen. High values in the logo-like plot signify important regions in the sequence for localization prediction that may correspond to sorting signals. This is meant to serve as a guideline and specialized tools such as SignalP or TargetP can be used for a more detailed and accurate analysis of these signals.

DeepLoc 2.0 uses a transformer-based protein language model to encode the input amino acid sequence. Then using an interpretable attention pooling mechanism a sequence representation is produced. The two prediction heads then utilize this representation to predict multiple labels for both the 10-type subcellular localization and 9-type sorting signal prediction tasks. Source of cell diagram:
https://commons.wikimedia.org/wiki/File:Simple_diagram_of_plant_cell_(blank).svg, attribution: domdomegg, CC BY 4.0 <
https://creativecommons.org/licenses/by/4.0>, via Wikimedia Commons.
Similar articles
-
DeepLoc 2.1: multi-label membrane protein type prediction using protein language models.
Ødum MT, Teufel F, Thumuluri V, Almagro Armenteros JJ, Johansen AR, Winther O, Nielsen H. Ødum MT, et al. Nucleic Acids Res. 2024 Jul 5;52(W1):W215-W220. doi: 10.1093/nar/gkae237. Nucleic Acids Res. 2024. PMID: 38587188 Free PMC article.
-
DeepLoc: prediction of protein subcellular localization using deep learning.
Almagro Armenteros JJ, Sønderby CK, Sønderby SK, Nielsen H, Winther O. Almagro Armenteros JJ, et al. Bioinformatics. 2017 Nov 1;33(21):3387-3395. doi: 10.1093/bioinformatics/btx431. Bioinformatics. 2017. PMID: 29036616
-
Predicting protein subcellular localization: past, present, and future.
Dönnes P, Höglund A. Dönnes P, et al. Genomics Proteomics Bioinformatics. 2004 Nov;2(4):209-15. doi: 10.1016/s1672-0229(04)02027-3. Genomics Proteomics Bioinformatics. 2004. PMID: 15901249 Free PMC article. Review.
-
A Brief History of Protein Sorting Prediction.
Nielsen H, Tsirigos KD, Brunak S, von Heijne G. Nielsen H, et al. Protein J. 2019 Jun;38(3):200-216. doi: 10.1007/s10930-019-09838-3. Protein J. 2019. PMID: 31119599 Free PMC article. Review.
Cited by
-
Shammi T, Lee Y, Trivedi J, Sierras D, Mansoor A, Maxwell JM, Williamson M, McMillan M, Chakravarty I, Uhde-Stone C. Shammi T, et al. Int J Mol Sci. 2024 Jul 13;25(14):7692. doi: 10.3390/ijms25147692. Int J Mol Sci. 2024. PMID: 39062943 Free PMC article.
-
Harada R, Hirakawa Y, Yabuki A, Kim E, Yazaki E, Kamikawa R, Nakano K, Eliáš M, Inagaki Y. Harada R, et al. Mol Biol Evol. 2024 Feb 1;41(2):msae014. doi: 10.1093/molbev/msae014. Mol Biol Evol. 2024. PMID: 38271287 Free PMC article.
-
Gupta SK, Osmanoglu Ö, Minocha R, Bandi SR, Bencurova E, Srivastava M, Dandekar T. Gupta SK, et al. Front Med (Lausanne). 2022 Nov 3;9:1008527. doi: 10.3389/fmed.2022.1008527. eCollection 2022. Front Med (Lausanne). 2022. PMID: 36405591 Free PMC article.
-
The energy metabolism of Balantidium polyvacuolum inhabiting the hindgut of Xenocypris davidi.
Bu XL, Zhao WS, Li ZY, Ma HW, Chen YS, Li WX, Zou H, Li M, Wang GT. Bu XL, et al. BMC Genomics. 2023 Oct 19;24(1):624. doi: 10.1186/s12864-023-09706-6. BMC Genomics. 2023. PMID: 37858069 Free PMC article.
References
-
- Rajendran L., Knölker H.-J., Simons K.. Subcellular targeting strategies for drug design and delivery. Nat. Rev. Drug Discov. 2010; 9:29–42. - PubMed
-
- Schmidt V., Willnow T.E.. Protein sorting gone wrong – VPS10P domain receptors in cardiovascular and metabolic diseases. Atherosclerosis. 2016; 245:194–199. - PubMed
-
- Guo Y., Sirkis D.W., Schekman R.. Protein sorting at the trans-Golgi network. Ann. Rev. Cell Dev. Biol. 2014; 30:169–206. - PubMed
-
- Delmolino L.M., Saha P., Dutta A.. Multiple mechanisms regulate subcellular localization of human CDC6. J. Biol. Chem. 2001; 276:26947–26954. - PubMed
Publication types
MeSH terms
Substances
LinkOut - more resources
Full Text Sources