pubmed.ncbi.nlm.nih.gov

Clumppling: cluster matching and permutation program with integer linear programming - PubMed

  • ️Mon Jan 01 2024

Clumppling: cluster matching and permutation program with integer linear programming

Xiran Liu et al. Bioinformatics. 2024.

Abstract

Motivation: In the mixed-membership unsupervised clustering analyses commonly used in population genetics, multiple replicate data analyses can differ in their clustering solutions. Combinatorial algorithms assist in aligning clustering outputs from multiple replicates so that clustering solutions can be interpreted and combined across replicates. Although several algorithms have been introduced, challenges exist in achieving optimal alignments and performing alignments in reasonable computation time.

Results: We present Clumppling, a method for aligning replicate solutions in mixed-membership unsupervised clustering. The method uses integer linear programming for finding optimal alignments, embedding the cluster alignment problem in standard combinatorial optimization frameworks. In example analyses, we find that it achieves solutions with preferred values of a desired objective function relative to those achieved by Pong and that it proceeds with less computation time than Clumpak. It is also the first method to permit alignments across replicates with multiple arbitrary values of the number of clusters K.

Availability and implementation: Clumppling is available at https://github.com/PopGenClustering/Clumppling.

© The Author(s) 2023. Published by Oxford University Press.

PubMed Disclaimer

Conflict of interest statement

None declared.

Figures

Figure 1.
Figure 1.

Clumppling-aligned modes for the Cape Verde dataset (K from 2 to 5), using the mean memberships as mode consensus and the “direct” approach to alignment across K values. The multipartite graph shows the alignment across different K. Edges are colored by the edge weight [Equation (17)]; darker color indicates a larger weight and thus better alignment. The numbers on the edges are the optimal solutions for pairwise alignments, representing minimum values in Equation (12). Each structure plot displays a mode, where the modes for the same K appear in decreasing order by their size—marked in parentheses above the top right corner of each plot—and then their within-mode similarity (if there is a tie in size).

Figure 2.
Figure 2.

Clumppling-aligned modes for the chicken dataset (K from 17 to 21), using the mean memberships as mode consensus and the “direct” approach to alignment across K values. The figure design follows Figure 1.

Similar articles

References

    1. Airoldi EM, Blei D, Erosheva EA. et al. Handbook of Mixed Membership Models and Their Applications. Boca Raton, FL: CRC Press, 2014.
    1. Alexander DH, Novembre J, Lange K.. Fast model-based estimation of ancestry in unrelated individuals. Genome Res 2009;19:1655–64. - PMC - PubMed
    1. Behr AA, Liu KZ, Liu-Fang G. et al. Pong: fast analysis and visualization of latent clusters in population genetic data. Bioinformatics 2016;32:2817–23. - PMC - PubMed
    1. Blondel VD, Guillaume J-L, Lambiotte R. et al. Fast unfolding of communities in large networks. J Stat Mech 2008;2008:P10008.
    1. Burkard R, Dell’Amico M, Martello S.. Assignment Problems. Philadelphia: Society for Industrial and Applied Mathematics, 2009.

Publication types

MeSH terms