Figure 2: Schematic of workflow for identification and interpretation of clusters. | Nature Communications

Parts a–c summarize the identification of clusters: a constructing network from IBD, b detecting network clusters, and c identifying subsets of clusters that separate in the spectral embedding. Part d summarizes the interpretation of clusters by annotating clusters with admixture and genealogical data. Part e summarizes the genealogical data—birth location annotations in pedigrees (shaded symbols in d)—for the ‘African American’ cluster. In e, each birth location in the pedigree (here, in generations 0–9, in which generation 0 is the genotyped individual) is converted to the nearest coordinate on a grid, with grid points every 0.5° of latitude and longitude. Point size is scaled by number of birth location annotations in the cluster at the given location, and coloured by odds ratio (OR): the proportion of ancestral birth locations linked to cluster members at that map location over the proportion linked to non-cluster members at the same location. Points on the map with higher odds ratios indicate geographic locations that are more associated with cluster membership. Maps were generated with the maps R package using data from the Natural Earth Project (1:50 m world map, version 2.0). These data are made available in the public domain (Creative Commons CC0).