pubmed.ncbi.nlm.nih.gov

Systematic Review on Local Ancestor Inference From a Mathematical and Algorithmic Perspective - PubMed

  • ️Fri Jan 01 2021

Review

Systematic Review on Local Ancestor Inference From a Mathematical and Algorithmic Perspective

Jie Wu et al. Front Genet. 2021.

Abstract

Genotypic data provide deep insights into the population history and medical genetics. The local ancestry inference (LAI) (also termed local ancestry deconvolution) method uses the hidden Markov model (HMM) to solve the mathematical problem of ancestry reconstruction based on genomic data. HMM is combined with other statistical models and machine learning techniques for particular genetic tasks in a series of computer tools. In this article, we surveyed the mathematical structure, application characteristics, historical development, and benchmark analysis of the LAI method in detail, which will help researchers better understand and further develop LAI methods. Firstly, we extensively explore the mathematical structure of each model and its characteristic applications. Next, we use bibliometrics to show detailed model application fields and list articles to elaborate on the historical development. LAI publications had experienced a peak period during 2006-2016 and had kept on moving in the following years. The efficiency, accuracy, and stability of the existing models were evaluated by the benchmark. We find that phased data had higher accuracy in comparison with unphased data. We summarize these models with their distinct advantages and disadvantages. The Loter model uses dynamic programming to obtain a globally optimal solution with its parameter-free advantage. Aligned bases can be used directly in the Seqmix model if the genotype is hard to call. This research may help model developers to realize current challenges, develop more advanced models, and enable scholars to select appropriate models according to given populations and datasets.

Keywords: HMM; LAI model; benchmark; bibliometrics; mathematical structure.

Copyright © 2021 Wu, Liu and Zhao.

PubMed Disclaimer

Conflict of interest statement

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Figures

FIGURE 1
FIGURE 1

(A) Number of local ancestry inference (LAI) publications from 2000 to 2020. (B) A visual survey of the major topics on LAI by the Carrot II system. (C) Distribution of models across the top four research fields including wild domestication, history of human evolution, disease risk, and ancient DNA. (D) Timeline illustrating the development of LAI.

FIGURE 2
FIGURE 2

Box plots of the accuracy of local ancestry inference (LAI) using a benchmark. The red hollow arrows indicate a higher accuracy by the median comparison in this simulation. The results showed that phased data had higher accuracy in comparison with unphased data.

Similar articles

Cited by

References

    1. Alexander D. H., Novembre J., Lange K. (2009). Fast model-based estimation of ancestry in unrelated individuals. Genome Res. 19 1655–1664. 10.1101/gr.094052.109 - DOI - PMC - PubMed
    1. Baran Y., Pasaniuc B., Sankararaman S., Torgerson D. G., Gignoux C., Eng C., et al. (2012). Fast and accurate inference of local ancestry in Latino populations. Bioinformatics 28 1359–1367. 10.1093/bioinformatics/bts144 - DOI - PMC - PubMed
    1. Brisbin A., Bryc K., Byrnes J., Zakharia F., Omberg L., Degenhardt J., et al. (2012). PCAdmix: principal components-based assignment of ancestry along each chromosome in individuals with admixed ancestry from two or more populations. Hum. Biol. 84 343–364. 10.3378/027.084.0401 - DOI - PMC - PubMed
    1. Bryc K., Durand E. Y., Macpherson J. M., Reich D., Mountain J. L. (2015). The Genetic Ancestry of African Americans, Latinos, and European Americans across the United States. Am. J. Hum. Genet. 96 37–53. 10.1016/j.ajhg.2014.11.010 - DOI - PMC - PubMed
    1. Bryc K., Velez C., Karafet T., Moreno-Estrada A., Reynolds A., Auton A., et al. (2010). Genome-wide patterns of population structure and admixture among Hispanic/Latino populations. Proc. Natl. Acad. Sci. U.S.A. 107 8954–8961. 10.1073/pnas.0914618107 - DOI - PMC - PubMed

Publication types