pubmed.ncbi.nlm.nih.gov

DeepLoc 2.0: multi-label subcellular localization prediction using protein language models - PubMed

  • ️Sat Jan 01 2022

DeepLoc 2.0: multi-label subcellular localization prediction using protein language models

Vineet Thumuluri et al. Nucleic Acids Res. 2022.

Abstract

The prediction of protein subcellular localization is of great relevance for proteomics research. Here, we propose an update to the popular tool DeepLoc with multi-localization prediction and improvements in both performance and interpretability. For training and validation, we curate eukaryotic and human multi-location protein datasets with stringent homology partitioning and enriched with sorting signal information compiled from the literature. We achieve state-of-the-art performance in DeepLoc 2.0 by using a pre-trained protein language model. It has the further advantage that it uses sequence input rather than relying on slower protein profiles. We provide two means of better interpretability: an attention output along the sequence and highly accurate prediction of nine different types of protein sorting signals. We find that the attention output correlates well with the position of sorting signals. The webserver is available at services.healthtech.dtu.dk/service.php?DeepLoc-2.0.

© The Author(s) 2022. Published by Oxford University Press on behalf of Nucleic Acids Research.

PubMed Disclaimer

Figures

Graphical Abstract
Graphical Abstract

DeepLoc 2.0 uses a transformer-based protein language model to predict multi-label subcellular localization and provides interpretability via the attention and sorting signal prediction.

Figure 1.
Figure 1.

An example snippet from the results page on the webserver. The prediction summary is available for download as a comma-separated file (CSV) at the top which consists of the predicted subcellular localization and sorting signals. The image or attention values of each plot can be separately downloaded. All the predicted subcellular localization and sorting signal labels are listed, along with the prediction score table. The predicted localizations in the table are highlighted in green. If no score crosses the threshold, the label closest to the threshold is chosen. High values in the logo-like plot signify important regions in the sequence for localization prediction that may correspond to sorting signals. This is meant to serve as a guideline and specialized tools such as SignalP or TargetP can be used for a more detailed and accurate analysis of these signals.

Figure 2.
Figure 2.

DeepLoc 2.0 uses a transformer-based protein language model to encode the input amino acid sequence. Then using an interpretable attention pooling mechanism a sequence representation is produced. The two prediction heads then utilize this representation to predict multiple labels for both the 10-type subcellular localization and 9-type sorting signal prediction tasks. Source of cell diagram:

https://commons.wikimedia.org/wiki/File:Simple_diagram_of_plant_cell_(blank).svg

, attribution: domdomegg, CC BY 4.0 <

https://creativecommons.org/licenses/by/4.0

>, via Wikimedia Commons.

Similar articles

Cited by

References

    1. Rajendran L., Knölker H.-J., Simons K.. Subcellular targeting strategies for drug design and delivery. Nat. Rev. Drug Discov. 2010; 9:29–42. - PubMed
    1. Schmidt V., Willnow T.E.. Protein sorting gone wrong – VPS10P domain receptors in cardiovascular and metabolic diseases. Atherosclerosis. 2016; 245:194–199. - PubMed
    1. Guo Y., Sirkis D.W., Schekman R.. Protein sorting at the trans-Golgi network. Ann. Rev. Cell Dev. Biol. 2014; 30:169–206. - PubMed
    1. Delmolino L.M., Saha P., Dutta A.. Multiple mechanisms regulate subcellular localization of human CDC6. J. Biol. Chem. 2001; 276:26947–26954. - PubMed
    1. Millar A.H., Carrie C., Pogson B., Whelan J.. Exploring the function-location nexus: using multiple lines of evidence in defining the subcellular location of plant proteins. Plant Cell. 2009; 21:1625–1631. - PMC - PubMed

Publication types

MeSH terms

Substances