Fusing literature and full network data improves disease similarity computation - PubMed
- ️Fri Jan 01 2016
Fusing literature and full network data improves disease similarity computation
Ping Li et al. BMC Bioinformatics. 2016.
Abstract
Background: Identifying relatedness among diseases could help deepen understanding for the underlying pathogenic mechanisms of diseases, and facilitate drug repositioning projects. A number of methods for computing disease similarity had been developed; however, none of them were designed to utilize information of the entire protein interaction network, using instead only those interactions involving disease causing genes. Most of previously published methods required gene-disease association data, unfortunately, many diseases still have very few or no associated genes, which impeded broad adoption of those methods. In this study, we propose a new method (MedNetSim) for computing disease similarity by integrating medical literature and protein interaction network. MedNetSim consists of a network-based method (NetSim), which employs the entire protein interaction network, and a MEDLINE-based method (MedSim), which computes disease similarity by mining the biomedical literature.
Results: Among function-based methods, NetSim achieved the best performance. Its average AUC (area under the receiver operating characteristic curve) reached 95.2 %. MedSim, whose performance was even comparable to some function-based methods, acquired the highest average AUC in all semantic-based methods. Integration of MedSim and NetSim (MedNetSim) further improved the average AUC to 96.4 %. We further studied the effectiveness of different data sources. It was found that quality of protein interaction data was more important than its volume. On the contrary, higher volume of gene-disease association data was more beneficial, even with a lower reliability. Utilizing higher volume of disease-related gene data further improved the average AUC of MedNetSim and NetSim to 97.5 % and 96.7 %, respectively.
Conclusions: Integrating biomedical literature and protein interaction network can be an effective way to compute disease similarity. Lacking sufficient disease-related gene data, literature-based methods such as MedSim can be a great addition to function-based algorithms. It may be beneficial to steer more resources torward studying gene-disease associations and improving the quality of protein interaction data. Disease similarities can be computed using the proposed methods at http:// www.digintelli.com:8000/ .
Keywords: Disease similarity; MedNetSim; MedSim; NetSim; Random walk with Restart.
Figures

Gene-disease association databases. a : The number of diseases, b: The number of associations between genes and diseases

Protein interaction datasets. a: The number of genes, b: The number of interactions between genes, *: The common protein-protein interactions

Overview of MedSim. DO: Human Disease Ontology database; UMLS: Unified Medical Language System

Performance of function-based methods. a ROC curves for the benchmark set and a random set. b Average AUC for the benchmark set and 50 random sets

Performance of semantic-based methods. a ROC curves for the experimental results on the benchmark set and a random set. b Average AUC for the benchmark set and 50 random sets. HPO_Res, HPO_Lin and HPO_Wang denoted disease similarities computation by using Resnik, Lin and Wang based on HPO, respectively

The impact of different data sources. a ROC curves for the experimental results on the benchmark set and a random set. b Average AUC for the benchmark set and 50 random sets

The average AUC of NetSim with different proportion of hPPIN sampled

An overview of disease similarity network (DSN) based on MedNetSim results. The graph was based on a force-directed layout using the similarity between diseases as attraction force. Nodes were colored according to the top-level DO category to which they belong

The sub-network around myasthenia gravis (a) and fibromyalgia (b). Nodes were colored according to membership in the top-level DO category. The thickness of the connections between the nodes reflects the degree of similarity
Similar articles
-
Prioritizing candidate diseases-related metabolites based on literature and functional similarity.
Wang Y, Juan L, Peng J, Zang T, Wang Y. Wang Y, et al. BMC Bioinformatics. 2019 Nov 25;20(Suppl 18):574. doi: 10.1186/s12859-019-3127-4. BMC Bioinformatics. 2019. PMID: 31760947 Free PMC article.
-
Liu B, Jin M, Zeng P. Liu B, et al. J Biomed Inform. 2015 Oct;57:1-5. doi: 10.1016/j.jbi.2015.07.005. Epub 2015 Jul 11. J Biomed Inform. 2015. PMID: 26173039
-
Zhang SW, Shao DD, Zhang SY, Wang YB. Zhang SW, et al. Mol Biosyst. 2014 Jun;10(6):1400-8. doi: 10.1039/c3mb70588a. Epub 2014 Apr 3. Mol Biosyst. 2014. PMID: 24695957
-
Peng J, Uygun S, Kim T, Wang Y, Rhee SY, Chen J. Peng J, et al. BMC Bioinformatics. 2015 Feb 14;16:44. doi: 10.1186/s12859-015-0474-7. BMC Bioinformatics. 2015. PMID: 25886899 Free PMC article.
-
Network-based prediction and knowledge mining of disease genes.
Carson MB, Lu H. Carson MB, et al. BMC Med Genomics. 2015;8 Suppl 2(Suppl 2):S9. doi: 10.1186/1755-8794-8-S2-S9. Epub 2015 May 29. BMC Med Genomics. 2015. PMID: 26043920 Free PMC article.
Cited by
-
MultiSourcDSim: an integrated approach for exploring disease similarity.
Deng L, Ye D, Zhao J, Zhang J. Deng L, et al. BMC Med Inform Decis Mak. 2019 Dec 19;19(Suppl 6):269. doi: 10.1186/s12911-019-0968-8. BMC Med Inform Decis Mak. 2019. PMID: 31856813 Free PMC article.
-
NEDD: a network embedding based method for predicting drug-disease associations.
Zhou R, Lu Z, Luo H, Xiang J, Zeng M, Li M. Zhou R, et al. BMC Bioinformatics. 2020 Sep 17;21(Suppl 13):387. doi: 10.1186/s12859-020-03682-4. BMC Bioinformatics. 2020. PMID: 32938396 Free PMC article.
-
Exploring novel disease-disease associations based on multi-view fusion network.
Yang X, Xu W, Leng D, Wen Y, Wu L, Li R, Huang J, Bo X, He S. Yang X, et al. Comput Struct Biotechnol J. 2023 Feb 24;21:1807-1819. doi: 10.1016/j.csbj.2023.02.038. eCollection 2023. Comput Struct Biotechnol J. 2023. PMID: 36923471 Free PMC article.
-
Understanding and predicting disease relationships through similarity fusion.
Oerton E, Roberts I, Lewis PSH, Guilliams T, Bender A. Oerton E, et al. Bioinformatics. 2019 Apr 1;35(7):1213-1220. doi: 10.1093/bioinformatics/bty754. Bioinformatics. 2019. PMID: 30169824 Free PMC article.
-
Jin S, Zeng X, Fang J, Lin J, Chan SY, Erzurum SC, Cheng F. Jin S, et al. NPJ Syst Biol Appl. 2019 Nov 13;5:41. doi: 10.1038/s41540-019-0115-2. eCollection 2019. NPJ Syst Biol Appl. 2019. PMID: 31754458 Free PMC article.
References
MeSH terms
LinkOut - more resources
Full Text Sources
Other Literature Sources