Clumppling: cluster matching and permutation program with integer linear programming - PubMed
- ️Mon Jan 01 2024
Clumppling: cluster matching and permutation program with integer linear programming
Xiran Liu et al. Bioinformatics. 2024.
Abstract
Motivation: In the mixed-membership unsupervised clustering analyses commonly used in population genetics, multiple replicate data analyses can differ in their clustering solutions. Combinatorial algorithms assist in aligning clustering outputs from multiple replicates so that clustering solutions can be interpreted and combined across replicates. Although several algorithms have been introduced, challenges exist in achieving optimal alignments and performing alignments in reasonable computation time.
Results: We present Clumppling, a method for aligning replicate solutions in mixed-membership unsupervised clustering. The method uses integer linear programming for finding optimal alignments, embedding the cluster alignment problem in standard combinatorial optimization frameworks. In example analyses, we find that it achieves solutions with preferred values of a desired objective function relative to those achieved by Pong and that it proceeds with less computation time than Clumpak. It is also the first method to permit alignments across replicates with multiple arbitrary values of the number of clusters K.
Availability and implementation: Clumppling is available at https://github.com/PopGenClustering/Clumppling.
© The Author(s) 2023. Published by Oxford University Press.
Conflict of interest statement
None declared.
Figures

Clumppling-aligned modes for the Cape Verde dataset (K from 2 to 5), using the mean memberships as mode consensus and the “direct” approach to alignment across K values. The multipartite graph shows the alignment across different K. Edges are colored by the edge weight [Equation (17)]; darker color indicates a larger weight and thus better alignment. The numbers on the edges are the optimal solutions for pairwise alignments, representing minimum values in Equation (12). Each structure plot displays a mode, where the modes for the same K appear in decreasing order by their size—marked in parentheses above the top right corner of each plot—and then their within-mode similarity (if there is a tie in size).

Clumppling-aligned modes for the chicken dataset (K from 17 to 21), using the mean memberships as mode consensus and the “direct” approach to alignment across K values. The figure design follows Figure 1.
Similar articles
-
Jakobsson M, Rosenberg NA. Jakobsson M, et al. Bioinformatics. 2007 Jul 15;23(14):1801-6. doi: 10.1093/bioinformatics/btm233. Epub 2007 May 7. Bioinformatics. 2007. PMID: 17485429
-
Llabrés M, Riera G, Rosselló F, Valiente G. Llabrés M, et al. BMC Bioinformatics. 2020 Nov 18;21(Suppl 6):434. doi: 10.1186/s12859-020-03733-w. BMC Bioinformatics. 2020. PMID: 33203352 Free PMC article.
-
Kopelman NM, Mayzel J, Jakobsson M, Rosenberg NA, Mayrose I. Kopelman NM, et al. Mol Ecol Resour. 2015 Sep;15(5):1179-91. doi: 10.1111/1755-0998.12387. Epub 2015 Feb 27. Mol Ecol Resour. 2015. PMID: 25684545 Free PMC article.
-
pong: fast analysis and visualization of latent clusters in population genetic data.
Behr AA, Liu KZ, Liu-Fang G, Nakka P, Ramachandran S. Behr AA, et al. Bioinformatics. 2016 Sep 15;32(18):2817-23. doi: 10.1093/bioinformatics/btw327. Epub 2016 Jun 9. Bioinformatics. 2016. PMID: 27283948 Free PMC article.
-
A Dirichlet model of alignment cost in mixed-membership unsupervised clustering.
Liu X, Kopelman NM, Rosenberg NA. Liu X, et al. J Comput Graph Stat. 2023;32(3):1145-1159. doi: 10.1080/10618600.2022.2127739. Epub 2022 Nov 14. J Comput Graph Stat. 2023. PMID: 37982130 Free PMC article.
References
-
- Airoldi EM, Blei D, Erosheva EA. et al. Handbook of Mixed Membership Models and Their Applications. Boca Raton, FL: CRC Press, 2014.
-
- Blondel VD, Guillaume J-L, Lambiotte R. et al. Fast unfolding of communities in large networks. J Stat Mech 2008;2008:P10008.
-
- Burkard R, Dell’Amico M, Martello S.. Assignment Problems. Philadelphia: Society for Industrial and Applied Mathematics, 2009.