pubmed.ncbi.nlm.nih.gov

Fusing literature and full network data improves disease similarity computation - PubMed

  • ️Fri Jan 01 2016

Fusing literature and full network data improves disease similarity computation

Ping Li et al. BMC Bioinformatics. 2016.

Abstract

Background: Identifying relatedness among diseases could help deepen understanding for the underlying pathogenic mechanisms of diseases, and facilitate drug repositioning projects. A number of methods for computing disease similarity had been developed; however, none of them were designed to utilize information of the entire protein interaction network, using instead only those interactions involving disease causing genes. Most of previously published methods required gene-disease association data, unfortunately, many diseases still have very few or no associated genes, which impeded broad adoption of those methods. In this study, we propose a new method (MedNetSim) for computing disease similarity by integrating medical literature and protein interaction network. MedNetSim consists of a network-based method (NetSim), which employs the entire protein interaction network, and a MEDLINE-based method (MedSim), which computes disease similarity by mining the biomedical literature.

Results: Among function-based methods, NetSim achieved the best performance. Its average AUC (area under the receiver operating characteristic curve) reached 95.2 %. MedSim, whose performance was even comparable to some function-based methods, acquired the highest average AUC in all semantic-based methods. Integration of MedSim and NetSim (MedNetSim) further improved the average AUC to 96.4 %. We further studied the effectiveness of different data sources. It was found that quality of protein interaction data was more important than its volume. On the contrary, higher volume of gene-disease association data was more beneficial, even with a lower reliability. Utilizing higher volume of disease-related gene data further improved the average AUC of MedNetSim and NetSim to 97.5 % and 96.7 %, respectively.

Conclusions: Integrating biomedical literature and protein interaction network can be an effective way to compute disease similarity. Lacking sufficient disease-related gene data, literature-based methods such as MedSim can be a great addition to function-based algorithms. It may be beneficial to steer more resources torward studying gene-disease associations and improving the quality of protein interaction data. Disease similarities can be computed using the proposed methods at http:// www.digintelli.com:8000/ .

Keywords: Disease similarity; MedNetSim; MedSim; NetSim; Random walk with Restart.

PubMed Disclaimer

Figures

Fig 1
Fig 1

Gene-disease association databases. a : The number of diseases, b: The number of associations between genes and diseases

Fig 2
Fig 2

Protein interaction datasets. a: The number of genes, b: The number of interactions between genes, *: The common protein-protein interactions

Fig 3
Fig 3

Overview of MedSim. DO: Human Disease Ontology database; UMLS: Unified Medical Language System

Fig 4
Fig 4

Performance of function-based methods. a ROC curves for the benchmark set and a random set. b Average AUC for the benchmark set and 50 random sets

Fig 5
Fig 5

Performance of semantic-based methods. a ROC curves for the experimental results on the benchmark set and a random set. b Average AUC for the benchmark set and 50 random sets. HPO_Res, HPO_Lin and HPO_Wang denoted disease similarities computation by using Resnik, Lin and Wang based on HPO, respectively

Fig 6
Fig 6

The impact of different data sources. a ROC curves for the experimental results on the benchmark set and a random set. b Average AUC for the benchmark set and 50 random sets

Fig 7
Fig 7

The average AUC of NetSim with different proportion of hPPIN sampled

Fig 8
Fig 8

An overview of disease similarity network (DSN) based on MedNetSim results. The graph was based on a force-directed layout using the similarity between diseases as attraction force. Nodes were colored according to the top-level DO category to which they belong

Fig 9
Fig 9

The sub-network around myasthenia gravis (a) and fibromyalgia (b). Nodes were colored according to membership in the top-level DO category. The thickness of the connections between the nodes reflects the degree of similarity

Similar articles

Cited by

References

    1. Bauer-Mehren A, Bundschus M, Rautschka M, Mayer MA, Sanz F, Furlong LI. Gene-disease network analysis reveals functional modules in mendelian, complex and environmental diseases. PLoS One. 2011;6(6):e20284. doi: 10.1371/journal.pone.0020284. - DOI - PMC - PubMed
    1. Suthram S, Dudley JT, Chiang AP, Chen R, Hastie TJ, Butte AJ. Network-based elucidation of human disease similarities reveals common functional modules enriched for pluripotent drug targets. PLoS Comput Biol. 2010;6(2):e1000662. doi: 10.1371/journal.pcbi.1000662. - DOI - PMC - PubMed
    1. Gottlieb A, Stein GY, Ruppin E, Sharan R. PREDICT: a method for inferring novel drug indications with application to personalized medicine. Mol Syst Biol. 2011;7:496. doi: 10.1038/msb.2011.26. - DOI - PMC - PubMed
    1. Goh KI, Cusick ME, Valle D, Childs B, Vidal M, Barabasi AL. The human disease network. Proc Natl Acad Sci U S A. 2007;104(21):8685–8690. doi: 10.1073/pnas.0701361104. - DOI - PMC - PubMed
    1. Hu G, Agarwal P. Human disease-drug network based on genomic expression profiles. PLoS One. 2009;4(8):e6536. doi: 10.1371/journal.pone.0006536. - DOI - PMC - PubMed

MeSH terms