Automated deconvolution of structured mixtures from heterogeneous tumor genomic data - PubMed
- ️Sun Jan 01 2017
Comparative Study
Automated deconvolution of structured mixtures from heterogeneous tumor genomic data
Theodore Roman et al. PLoS Comput Biol. 2017.
Abstract
With increasing appreciation for the extent and importance of intratumor heterogeneity, much attention in cancer research has focused on profiling heterogeneity on a single patient level. Although true single-cell genomic technologies are rapidly improving, they remain too noisy and costly at present for population-level studies. Bulk sequencing remains the standard for population-scale tumor genomics, creating a need for computational tools to separate contributions of multiple tumor clones and assorted stromal and infiltrating cell populations to pooled genomic data. All such methods are limited to coarse approximations of only a few cell subpopulations, however. In prior work, we demonstrated the feasibility of improving cell type deconvolution by taking advantage of substructure in genomic mixtures via a strategy called simplicial complex unmixing. We improve on past work by introducing enhancements to automate learning of substructured genomic mixtures, with specific emphasis on genome-wide copy number variation (CNV) data, as well as the ability to process quantitative RNA expression data, and heterogeneous combinations of RNA and CNV data. We introduce methods for dimensionality estimation to better decompose mixture model substructure; fuzzy clustering to better identify substructure in sparse, noisy data; and automated model inference methods for other key model parameters. We further demonstrate their effectiveness in identifying mixture substructure in true breast cancer CNV data from the Cancer Genome Atlas (TCGA). Source code is available at https://github.com/tedroman/WSCUnmix.
Conflict of interest statement
The authors have declared that no competing interests exist.
Figures
![Fig 1](https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5ac1/5695636/6d2c68d9dc6c/pcbi.1005815.g001.gif)
These matrix inputs are passed to our simplicial complex inference code, which infers a mixed membership model of the data and associated model likelihood. The inference is computed by (1) principal components analysis (PCA) to perform dimensionality reduction and denoising of geometric structure; (2) medoidshift pre-clustering to identify low-dimensional sub-manifolds corresponding to distinct submixtures of the data; (3) dimensionality inference via sliver estimation to estimate likely numbers of mixture components needed to model each submixture; (4) unmixing on each substructure to identify preliminary mixture decompositions of the submixtures; and (5) a K-nearest-neighbor (KNN-based) reconciliation model to identify likely shared vertices between submanifolds. Each of these steps is explained in more detail in the main text. The inferred low dimension subspaces may be partially- or non-intersecting. We require, however, that the subspaces form a continuous structure, and merge disconnected subspaces using a maximum likelihood model. The inferred mixture components are then used in downstream functional annotation to identify dysregulated pathways or term associations.
![Fig 2](https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5ac1/5695636/b18bac177937/pcbi.1005815.g002.gif)
![Fig 3](https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5ac1/5695636/9344a81fca7c/pcbi.1005815.g003.gif)
Note that the tetrahedron inferred was considered alongside other simplices and simplicial complex but considered most likely. The data are enclosed in the tetrahedron, and as such can be approximated as mixtures of the vertices. Data points, corresponding to distinct tumor samples plotted in principal component space, are color coded by immunohistological subtype (red circle: Her2+, purple plus: ER/PR+, blue asterisk: triple-negative).
![Fig 4](https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5ac1/5695636/573f0719ad90/pcbi.1005815.g004.gif)
The inferred structure of three arms sharing a point corresponds to a phylogeny of one most recent common ancestor, and three branches of a tree. Data points, corresponding to distinct tumor samples plotted in principal component space, are color coded by immunohistological subtype (red circle: Her2+, purple plus: ER/PR+, blue asterisk: triple-negative).
![Fig 5](https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5ac1/5695636/5419832bd01e/pcbi.1005815.g005.gif)
The inferred structure of a tetrahedron and triangle sharing a point corresponds to two phylogenetic branches, one with four components and one with three components. Data points, corresponding to distinct tumor samples plotted in principal component space, are color coded by immunohistological subtype (red circle: Her2+, purple plus: ER/PR+, blue asterisk: triple-negative).
Similar articles
-
A simplicial complex-based approach to unmixing tumor progression data.
Roman T, Nayyeri A, Fasy BT, Schwartz R. Roman T, et al. BMC Bioinformatics. 2015 Aug 12;16:254. doi: 10.1186/s12859-015-0694-x. BMC Bioinformatics. 2015. PMID: 26264682 Free PMC article.
-
NOJAH: NOt Just Another Heatmap for genome-wide cluster analysis.
Rupji M, Dwivedi B, Kowalski J. Rupji M, et al. PLoS One. 2019 Mar 28;14(3):e0204542. doi: 10.1371/journal.pone.0204542. eCollection 2019. PLoS One. 2019. PMID: 30921318 Free PMC article.
-
Evaluation of somatic copy number estimation tools for whole-exome sequencing data.
Nam JY, Kim NK, Kim SC, Joung JG, Xi R, Lee S, Park PJ, Park WY. Nam JY, et al. Brief Bioinform. 2016 Mar;17(2):185-92. doi: 10.1093/bib/bbv055. Epub 2015 Jul 25. Brief Bioinform. 2016. PMID: 26210357 Free PMC article.
-
Exome sequence read depth methods for identifying copy number changes.
Kadalayil L, Rafiq S, Rose-Zerilli MJ, Pengelly RJ, Parker H, Oscier D, Strefford JC, Tapper WJ, Gibson J, Ennis S, Collins A. Kadalayil L, et al. Brief Bioinform. 2015 May;16(3):380-92. doi: 10.1093/bib/bbu027. Epub 2014 Aug 28. Brief Bioinform. 2015. PMID: 25169955 Review.
-
Yadav VK, De S. Yadav VK, et al. Brief Bioinform. 2015 Mar;16(2):232-41. doi: 10.1093/bib/bbu002. Epub 2014 Feb 20. Brief Bioinform. 2015. PMID: 24562872 Free PMC article. Review.
Cited by
-
Kalita CA, Gusev A. Kalita CA, et al. Genome Biol. 2022 Jul 8;23(1):152. doi: 10.1186/s13059-022-02708-9. Genome Biol. 2022. PMID: 35804456 Free PMC article.
-
A graph-based algorithm for estimating clonal haplotypes of tumor sample from sequencing data.
Wang Y, Zhang X, Ding S, Geng Y, Liu J, Zhao Z, Zhang R, Xiao X, Wang J. Wang Y, et al. BMC Med Genomics. 2019 Jan 31;12(Suppl 1):27. doi: 10.1186/s12920-018-0457-4. BMC Med Genomics. 2019. PMID: 30704456 Free PMC article.
References
-
- Loeb LA. A Mutator Phenotype in Cancer. Cancer Research. 2001;61(8):3230–3239. - PubMed
-
- Marusyk A, Almendro V, Polyak K. Intra-Tumour Heterogeneity: A Looking Glass for Cancer? Nature Reviews Cancer. 2012;12(5):323–334. doi: 10.1038/nrc3261 - DOI - PubMed
-
- Shackney SE, Smith CA, Pollice A, Brown K, Day R, Julian T, et al. Intracellular Patterns of Her-2/neu, Ras, and Ploidy Abnormalities in Primary Human Breast Cancers Predict Postoperative Clinical Disease-Free Survival. Clinical Cancer Research. 2004;10(9):3042–3052. doi: 10.1158/1078-0432.CCR-0401-3 - DOI - PubMed
-
- Heselmeyer-Haddad K, Garcia LYB, Bradley A, Ortiz-Melendez C, Lee WJ, Christensen R, et al. Single-Cell Genetic Analysis of Ductal Carcinoma in situ and Invasive Breast Cancer Reveals Enormous Tumor Heterogeneity Yet Conserved Genomic Imbalances and Gain of MYC During Progression. The American Journal of Pathology. 2012;181(5):1807–1822. doi: 10.1016/j.ajpath.2012.07.012 - DOI - PMC - PubMed
-
- Navin N, Kendall J, Troge J, Andrews P, Rodgers L, McIndoo J, et al. Tumour Evolution Inferred by Single-Cell Sequencing. Nature. 2011;472(7341):90–94. doi: 10.1038/nature09807 - DOI - PMC - PubMed
Publication types
MeSH terms
LinkOut - more resources
Full Text Sources
Other Literature Sources
Medical
Miscellaneous