nature.com

Inconsistency in large pharmacogenomic studies - Nature

  • ️Quackenbush, John
  • ️Wed Nov 27 2013

References

  1. Roden, D. M. & George, A. L., Jr The genetic basis of drug response. Nature 1, 37–44 (2002)

    Google Scholar 

  2. Shoemaker, R. H. The NCI60 human tumour cell line anticancer drug screen. Nature Rev. Cancer 6, 813–823 (2006)

    Google Scholar 

  3. Weinstein, J. N. Drug discovery: Cell lines battle cancer. Nature 483, 544–545 (2012)

    Google Scholar 

  4. Heiser, L. M. et al. Subtype and pathway specific responses to anticancer compounds in breast cancer. Proc. Natl Acad. Sci. USA 109, 2724–2729 (2012)

    Google Scholar 

  5. Yamori, T. Panel of human cancer cell lines provides valuable database for drug discovery and bioinformatics. Cancer Chemother. Pharmacol. 52 (Suppl. 1). 74–79 (2003)

    Google Scholar 

  6. Garnett, M. J. et al. Systematic identification of genomic markers of drug sensitivity in cancer cells. Nature 483, 570–575 (2012)

    Google Scholar 

  7. Barretina, J. et al. The Cancer Cell Line Encyclopedia enables predictive modelling of anticancer drug sensitivity. Nature 483, 603–607 (2012)

    Google Scholar 

  8. Wu, R. & Lin, M. Statistical and Computational Pharmacogenomics (Chapman and Hall/CRC, 2010)

    Google Scholar 

  9. Subramanian, A. et al. Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profiles. Proc. Natl Acad. Sci. USA 102, 15545–15550 (2005)

    Google Scholar 

  10. Greshock, J. et al. Molecular target class is predictive of in vitro response profile. Cancer Res. 70, 3677–3686 (2010)

    Google Scholar 

  11. Papillon-Cavanagh, S. et al. Comparison and validation of genomic predictors for anticancer drug sensitivity. J. Am. Med. Inform. Assoc. 20, 597–602 (2013)

    Google Scholar 

  12. Spearman, C. The proof and measurement of association between two things. Int. J. Epidemiol. 39, 1137–1150 (2010)

    Google Scholar 

  13. Barretina, J. et al. Addendum: The Cancer Cell Line Encyclopedia enables predictive modelling of anticancer drug sensitivity. Nature 492, 290 (2012)

    Google Scholar 

  14. Parkinson, H. et al. ArrayExpress–a public database of microarray experiments and gene expression profiles. Nucleic Acids Res. 35, D747–D750 (2007)

    Google Scholar 

  15. McCall, M. N., Bolstad, B. M. & Irizarry, R. A. Frozen robust multiarray analysis (fRMA). Biostatistics 11, 242–253 (2010)

    Google Scholar 

  16. Li, Q., Birkbak, N. J., Győrffy, B., Szallasi, Z. & Eklund, A. C. Jetset: selecting the optimal microarray probe set to represent a gene. BMC Bioinformatics 12, 474 (2011)

    Google Scholar 

  17. Ashburner, M. et al. Gene ontology: tool for the unfication of biology. Nature Genet. 25, 25–29 (2000)

    Google Scholar 

  18. Sim, J. & Wright, C. C. The kappa statistic in reliability studies: use, interpretation, and sample size requirements. Phys. Ther. 85, 257–268 (2005)

    Google Scholar 

Download references

Acknowledgements

We thank J. Archambault for his insightful comments on the comparative study between experimental protocols used in the large pharmacogenomic studies investigated in this work. The authors would like to thank the investigators of the Cancer Genome Project, the Cancer Cell Line Encyclopedia and the GlaxoSmithKline cell line study who have made their invaluable data available to the scientific community. N.E.-H. was supported by an IRCM doctoral fellowship. A.H.B. was supported by an award from the Klarman Family Foundation and by support from NIH grant CA087969. N.J.B. was funded The Villum Kann Rasmussen Foundation. J.Q. was supported grants from the Dr Miriam and Sheldon G. Adelson Medical Research Foundation and from the NCI GAME-ON Cancer Post-GWAS initiative (U19 CA148065-01).

Author information

Author notes

  1. Andrew H. Beck, Hugo J. W. L. Aerts and John Quackenbush: These authors contributed equally to this work.

Authors and Affiliations

  1. Institut de Recherches Cliniques de Montréal, University of Montreal, Montreal, Quebec, Canada ,

    Benjamin Haibe-Kains & Nehme El-Hachem

  2. Ontario Cancer Institute, Princess Margaret Cancer Centre, University Health Network, Toronto, Ontario M5G 2M9, Canada ,

    Benjamin Haibe-Kains

  3. Department of Systems Biology, Center for Biological Sequence Analysis, Technical University of Denmark, 2800 Kgs, Lyngby, Denmark,

    Nicolai Juul Birkbak

  4. Department of Pathology, Beth Israel Deaconess Medical Center and Harvard Medical School, Boston, 02215, Massachusetts, USA

    Andrew C. Jin & Andrew H. Beck

  5. Department of Biostatistics and Computational Biology and Center for Cancer Computational Biology, Dana-Farber Cancer Institute, Boston, 02215, Massachusetts, USA

    Hugo J. W. L. Aerts & John Quackenbush

  6. Department of Radiation Oncology & Radiology, Dana-Farber Cancer Institute, Brigham and Women’s Hospital, Harvard Medical School, Boston, 02215, Massachusetts, USA

    Hugo J. W. L. Aerts

  7. Department of Radiation Oncology, Maastricht University, Maastricht 6200 MD, The Netherlands,

    Hugo J. W. L. Aerts

  8. Department of Cancer Biology, Dana-Farber Cancer Institute, Boston, 02215, Massachusetts, USA

    John Quackenbush

Authors

  1. Benjamin Haibe-Kains

    You can also search for this author inPubMed Google Scholar

  2. Nehme El-Hachem

    You can also search for this author inPubMed Google Scholar

  3. Nicolai Juul Birkbak

    You can also search for this author inPubMed Google Scholar

  4. Andrew C. Jin

    You can also search for this author inPubMed Google Scholar

  5. Andrew H. Beck

    You can also search for this author inPubMed Google Scholar

  6. Hugo J. W. L. Aerts

    You can also search for this author inPubMed Google Scholar

  7. John Quackenbush

    You can also search for this author inPubMed Google Scholar

Contributions

B.H.-K. conceived the study with major contributions from N.J.B., H.J.W.L.A. and J.Q. B.H.-K. and N.E.-H. collected and curated the gene expression profiles and drug phenotypic data. A.C.J. and A.H.B. collected and curated the mutation data. B.H.-K. performed all the analyses and wrote the code with contributions from N.E.-H. and A.C.J. N.E.-H., A.C.J. and A.H.B. compared the experimental protocols of the pharmacogenomic studies. B.H.-K., A.H.B., H.J.W.L.A. and J.Q. supervised the study. B.H.-K., A.H.B., H.J.W.L.A. and J.Q. wrote the manuscript with contributions from N.E.-H. and N.J.B. All authors discussed the results and commented on the manuscript.

Corresponding author

Correspondence to Benjamin Haibe-Kains.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Additional information

The R code enables one to download and process the pharmacogenomic data and to generate all the results presented in the paper and its Supplementary Information.

Extended data figures and tables

Extended Data Figure 1 Intersection between the pharmacogenomic studies in terms of drugs, cell lines and genes.

a, Venn diagram reporting the number of drugs shared between CGP and CCLE studies. b, Description of the 15 anticancer drugs screened both in CGP and CCLE studies. c, Venn diagram reporting the number of drugs shared between CGP, CCLE and GSK studies. d, Venn diagram reporting the number of cell lines shared by CGP and CCLE studies. e, Number of cell lines for each tissue type among the 471 common to CGP and CCLE studies. f, Venn diagram reporting the number of cell lines shared between CGP, CCLE and GSK studies. g, Venn diagram reporting the number of genes whose presence of mutations was assessed both in CGP and CCLE studies. h, Venn diagram reporting the number of genes whose expression was assessed both in CGP and CCLE studies.

Extended Data Figure 2 Box plot of the correlations of missense mutation profiles between identical cell lines in CGP and CCLE.

Two‐sided Wilcoxon rank‐sum test was used to test whether agreement (Cohen’s κ coefficient) was significantly higher in identical cell lines compared to different cell lines (upper‐right corner).

Extended Data Figure 4 Consistency of IC50 values within the range of tested concentrations between CGP and CCLE.

a, Scatter plots reporting the drug sensitivity measurements, which are the IC50 values within the range of tested concentrations (thus excluding extrapolated IC50 in CGP and placeholder values in CCLE) in the 471 cell lines and for each of the 15 drugs investigated both in CGP and CCLE. b, Spearman’s rank correlation coefficient (rs) for each drug where significance of each correlation coefficient is reported using an asterisk if one‐sided P value <0.05.

Extended Data Figure 5 Consistency of AUC values between CGP and CCLE.

a, Scatter plots reporting the drug sensitivity (AUC) measured in the 471 cell lines and for each of the 15 drugs investigated both in CGP and CCLE. b, Spearman’s rank correlation coefficient (rs) for each drug where significance of each correlation coefficient is reported using an asterisk if one‐sided P value <0.05.

Extended Data Figure 6 Consistency of AUC‐based gene–drug associations between CGP and CCLE.

a, Scatter plots reporting the gene–drug associations computed with AUC, as quantified by the standardized coefficient of the gene of interest in a linear model controlled for tissue type, in the 471 cell lines and for each of the 15 drugs investigated both in CGP and CCLE. b, Spearman’s rank correlation coefficient (rs) for each drug where significance of each correlation coefficient is reported using an asterisk if one‐sided P value <0.05.

Extended Data Figure 7 Consistency of AUC‐based pathway–drug associations between CGP and CCLE.

a, Scatter plots reporting the pathway–drug associations computed with AUC, as quantified by the standardized coefficient of the gene of interest in a linear model controlled for tissue type, in the 471 cell lines and for each of the 15 drugs investigated both in CGP and CCLE. b, Spearman’s rank correlation coefficient (rs) for each drug where significance of each correlation coefficient is reported using an asterisk if one‐sided P value <0.05.

Extended Data Figure 8 Consistency of AUC‐based mutation–drug associations between CGP and CCLE.

a, Scatter plots reporting the mutation–drug associations computed with AUC, as quantified by the standardized coefficient of the gene of interest in a linear model controlled for tissue type, in the 471 cell lines and for each of the 15 drugs investigated both in CGP and CCLE. b, Spearman’s rank correlation coefficient (rs) for each drug where significance of each correlation coefficient is reported using an asterisk if one‐sided P value <0.05.

Extended Data Figure 9 Comparison of drug sensitivity measured in CGP and CCLE with GSK.

a, Scatter plots reporting the drug sensitivity measurements (IC50) of all drugs and cell lines screened both in CCLE and GSK data sets (2 drugs in 249 cell lines). b, Scatter plots reporting the drug sensitivity measurements (IC50) of all drugs and cell lines screened both in CCLE and GSK data sets (5 drugs in 231 cell lines). Significance of the Spearman's rank correlation (positive) coefficient is reported as one‐sided P value.

Extended Data Table 1 Spearman’s rank correlation coefficients and significance for consistency of drug sensitivity, gene–drug and pathway–drug associations for IC50 (a) and AUC (b)

Full size table

Supplementary information

Supplementary Information

This file contains a list of abbreviations, the instructions to fully reproduce the analysis results from the R scripts, the comparison of pharmacological assays, Supplementary Tables 1-2, Supplementary Figures 1-23 and Supplementary References. (PDF 13904 kb)

Supplementary Data

This file contains Supplementary set 1, R scripts. The archive (zip) contains the R scripts and accompanying files to enable full reproducibility of the analysis results. (ZIP 11045 kb)

Supplementary Data

This zipped file contains Supplementary Data sets 2 and 3. Supplementary File 2, Statistics for the gene-drug-associations for IC50 in CGP reports the gene-drug associations using IC50 as drug sensitivity measure, including the standardized coefficient, its standard error, t statistic, nominal p-value and FDR for the 12,187 genes and 15 drugs screened in CGP. Supplementary File 3, Statistics for the gene-drug-associations for IC50 in CCLE reports the gene-drug associations using IC50 as drug sensitivity measure, including the standardized coefficient, its standard error, t statistic, nominal p-value and FDR for the 12,187 genes and 15 drugs screened in CCLE. (ZIP 25103 kb)

Supplementary Data

This zipped file contains Supplementary Data sets 4-13 and a guide to the data. (ZIP 30820 kb)

PowerPoint slides

About this article

Cite this article

Haibe-Kains, B., El-Hachem, N., Birkbak, N. et al. Inconsistency in large pharmacogenomic studies. Nature 504, 389–393 (2013). https://doi.org/10.1038/nature12831

Download citation

  • Received: 15 April 2013

  • Accepted: 07 November 2013

  • Published: 27 November 2013

  • Issue Date: 19 December 2013

  • DOI: https://doi.org/10.1038/nature12831