pubmed.ncbi.nlm.nih.gov

Learning important features from multi-view data to predict drug side effects - PubMed

  • ️Tue Jan 01 2019

Learning important features from multi-view data to predict drug side effects

Xujun Liang et al. J Cheminform. 2019.

Abstract

The problem of drug side effects is one of the most crucial issues in pharmacological development. As there are many limitations in current experimental and clinical methods for detecting side effects, a lot of computational algorithms have been developed to predict side effects with different types of drug information. However, there is still a lack of methods which could integrate heterogeneous data to predict side effects and select important features at the same time. Here, we propose a novel computational framework based on multi-view and multi-label learning for side effect prediction. Four different types of drug features are collected and graph model is constructed from each feature profile. After that, all the single view graphs are combined to regularize the linear regression functions which describe the relationships between drug features and side effect labels. L1 penalties are imposed on the regression coefficient matrices in order to select features relevant to side effects. Additionally, the correlations between side effect labels are also incorporated into the model by graph Laplacian regularization. The experimental results show that the proposed method could not only provide more accurate prediction for side effects but also select drug features related to side effects from heterogeneous data. Some case studies are also supplied to illustrate the utility of our method for prediction of drug side effects.

Keywords: Feature selection; Heterogeneous data integration; Side effect prediction.

PubMed Disclaimer

Conflict of interest statement

The authors declare that they have no competing interests.

Figures

Fig. 1
Fig. 1

There is both consistent and complementary information related to side effects in different drug feature profiles. Drugs cluster together according to the similarities calculated with different features or side effect labels. The blocks of drugs along the diagonals are identified by the R package ‘dynamicTreeCut’ [67]. The overlaps between the blocks in each feature similarity matrix of drugs and the blocks in side effect similarity matrix of drugs are determined by Fisher’s exact test (p-value<0.05). The significantly overlapping blocks are marked by coloured rectangles in the heat-maps. The purple rectangles indicate that the blocks in the side effect similarity matrix of drugs overlap with blocks in one of the feature similarity matrices (for example, block e1 overlaps with block a1 in the chemical similarity matrix). The green rectangles indicate that the blocks in the side effect similarity matrix of drugs overlap with blocks in two or three feature similarity matrices (for example, block e2 overlaps with a2 in the chemical similarity matrix and block d2 in the gene expression similarity matrix). The red rectangles indicate that the blocks in the side effect similarity matrix of drugs overlap with blocks in all feature matrices (for example, block e3 overlaps with block a3, b3, c3, d3). The legend indicates the value of similarity, from 0 (blue) to 1 (red)

Fig. 2
Fig. 2

The number of features selected by different methods from different feature profiles

Fig. 3
Fig. 3

The Venn diagrams show the overlaps of features selected by different methods from different feature profiles

Fig. 4
Fig. 4

The average Spearman’s correlation between the feature coefficients learned by multi-LRSL and feature data extracted from CTD for the same side effect is significantly bigger than random samples. The blue lines represent the density estimates for the averages of correlation coefficients of 1000 random samples. For each random sample, the average correlation is calculated with the same number of pairs of randomly selected feature coefficients and CTD feature data. The red arrows indicate the positions of the average correlation coefficients between paired feature coefficients and feature data (the frequency of features for chemical substructures, protein domains and gene ontology terms and the averages of gene expression changes). The p-values are estimated by Monte-Carlo test

Fig. 5
Fig. 5

The prediction results for hepatotoxicity. a The X axis represents the features with the largest coefficients for hepatotoxicity (10 features from each feature profile). The Y axis represents the top-ranked predicted drugs (5 test drugs without any known side effects, and 5 drugs with known side effects but without record for hepatotoxicity. The DrugBank IDs of the drugs with known side effects are underlined). Here, the colours on the heat-map represent the values of the selected features in the feature vectors of these drugs. b The X axis represents the features with the largest coefficients for hepatotoxicity (10 features from each feature profile), and the Y axis represents the side effects most frequently co-occurred with hepatotoxicity. The colours on the heat-map represent the values of the coefficients learned by multi-LRSL for each side effect. In both (a) and (b) different types of features are separated by grey dash lines

Similar articles

Cited by

References

    1. Hornberg JJ, Laursen M, Brenden N, Persson M, Thougaard AV, Toft DB, Mow T. Exploratory toxicology as an integrated part of drug discovery. Part i: why and how. Drug Discov Today. 2014;19(8):1131–1136. doi: 10.1016/j.drudis.2013.12.008. - DOI - PubMed
    1. Giacomini KM, Krauss RM, Roden DM, Eichelbaum M, Hayden MR, Nakamura Y. When good drugs go bad. Nature. 2007;446(7139):975–7. doi: 10.1038/446975a. - DOI - PubMed
    1. Liang X, Zhang P, Yan L, Fu Y, Peng F, Qu L, Shao M, Chen Y, Chen Z. LRSSL: predict and interpret drug-disease associations based on data integration using sparse subspace learning. Bioinformatics (Oxford, England) 2017;33:1187–1196. doi: 10.1093/bioinformatics/btw770. - DOI - PubMed
    1. Luo H, Wang J, Li M, Luo J, Peng X, Wu F-X, Pan Y. Drug repositioning based on comprehensive similarity measures and bi-random walk algorithm. Bioinformatics (Oxford, England) 2016;32:2664–2671. doi: 10.1093/bioinformatics/btw228. - DOI - PubMed
    1. Luo H, Li M, Wang S, Liu Q, Li Y, Wang J. Computational drug repositioning using low-rank matrix approximation and randomized algorithms. Bioinformatics (Oxford, England) 2018;34:1904–1912. doi: 10.1093/bioinformatics/bty013. - DOI - PubMed