Allelic mapping bias in RNA-sequencing is not a major confounder in eQTL studies - PubMed
- ️Wed Jan 01 2014
Allelic mapping bias in RNA-sequencing is not a major confounder in eQTL studies
Nikolaos I Panousis et al. Genome Biol. 2014.
Abstract
Background: RNA sequencing (RNA-seq) is the current gold-standard method to quantify gene expression for expression quantitative trait locus (eQTL) studies. However, a potential caveat in these studies is that RNA-seq reads carrying the non-reference allele of variant loci can have lower probability to map correctly to the reference genome, which could bias gene quantifications and cause false positive eQTL associations. In this study, we analyze the effect of this allelic mapping bias in eQTL discovery.
Results: We simulate RNA-seq read mapping over 9.5 M common SNPs and indels, with 15.6% of variants showing biased mapping rate for reference versus non-reference reads. However, removing potentially biased RNA-seq reads from an eQTL dataset of 185 individuals has a very small effect on gene and exon quantifications and eQTL discovery. We detect only a handful of likely false positive eQTLs, and overall eQTL SNPs show no significant enrichment for high mapping bias.
Conclusion: Our results suggest that RNA-seq quantifications are generally robust against allelic mapping bias, and that this does not have a severe effect on eQTL discovery. Nevertheless, we provide our catalog of putatively biased loci to allow better controlling for mapping bias to obtain more accurate results in future RNA-seq studies.
Figures

Estimation of mapping bias and its effect on expression quantifications. (A) Mapping bias in SNPs and indels of 1000 Genomes estimated by simulated single-end 50 bp RNA-seq reads based on the genome sequence and aligned with BWA. (B) The impact of filtering for reads with simulated mapping bias on exon quantifications (log10 scale).

Correlation of mapping bias with different alignment methods and read types. (A, B) Spearman correlation of reference allele ratio for SNPs (A) and indels (B) in simulations with different mappers and different read building strategies.

Comparison of allelic ratios with different read types. Reference allele ratio obtained from simulated reads over SNPs (A, C) and indels (B, D) comparing genome versus transcriptome-based reads (A, B) in single-end 50 bp reads mapped with the GEM mapper, and comparing single- versus paired-end reads (C, D) in genome-based 50 bp reads mapped with BWA. The dotted lines denote 5% difference of the ratio for reference/non-reference allele. The color scale from dark blue to light blue denote the density of the points.

Effect of mapping bias on eQTL discovery. (A) Comparison of original and filtered P values (rho =0.92, P value <2.2e-16) shows that for the vast majority of the genes, the P values after filtering potentially biased reads are highly consistent with P values without filtering. Colors denote whether the gene with an eQTL in the original dataset was significant only before (lost) or only after (gained) filtering putatively biased reads or in both analyses (common). The dotted lines denote 10% FDR significance thresholds. (B) The number of exons per gene with significant associations as a function of the total number of quantified exons in the original, non-filtered dataset. (C) Proportion of the best-associated exon per gene with genetic variants (see also Additional file 1: Table S5). (D) Proportion of biased variants in six different categories based on the single-end genome based mapped with BWA simulated reads. Matched eQTL null is a random sample of variants matched to the distance from TSS of eQTLs.
Similar articles
-
Khansefid M, Pryce JE, Bolormaa S, Chen Y, Millen CA, Chamberlain AJ, Vander Jagt CJ, Goddard ME. Khansefid M, et al. BMC Genomics. 2018 Nov 3;19(1):793. doi: 10.1186/s12864-018-5181-0. BMC Genomics. 2018. PMID: 30390624 Free PMC article.
-
A statistical framework for eQTL mapping using RNA-seq data.
Sun W. Sun W. Biometrics. 2012 Mar;68(1):1-11. doi: 10.1111/j.1541-0420.2011.01654.x. Epub 2011 Aug 12. Biometrics. 2012. PMID: 21838806 Free PMC article.
-
Strategies for eQTL mapping in allopolyploid organisms.
Fan KH, Devos KM, Schliekelman P. Fan KH, et al. Theor Appl Genet. 2020 Aug;133(8):2477-2497. doi: 10.1007/s00122-020-03612-1. Epub 2020 May 27. Theor Appl Genet. 2020. PMID: 32462429
-
The Power of Single-Cell RNA Sequencing in eQTL Discovery.
Maria M, Pouyanfar N, Örd T, Kaikkonen MU. Maria M, et al. Genes (Basel). 2022 Mar 12;13(3):502. doi: 10.3390/genes13030502. Genes (Basel). 2022. PMID: 35328055 Free PMC article. Review.
-
The study of eQTL variations by RNA-seq: from SNPs to phenotypes.
Majewski J, Pastinen T. Majewski J, et al. Trends Genet. 2011 Feb;27(2):72-9. doi: 10.1016/j.tig.2010.10.006. Epub 2010 Nov 29. Trends Genet. 2011. PMID: 21122937 Review.
Cited by
-
Thibodeau SN, French AJ, McDonnell SK, Cheville J, Middha S, Tillmans L, Riska S, Baheti S, Larson MC, Fogarty Z, Zhang Y, Larson N, Nair A, O'Brien D, Wang L, Schaid DJ. Thibodeau SN, et al. Nat Commun. 2015 Nov 27;6:8653. doi: 10.1038/ncomms9653. Nat Commun. 2015. PMID: 26611117 Free PMC article.
-
Zea mays RNA-seq estimated transcript abundances are strongly affected by read mapping bias.
Zhan S, Griswold C, Lukens L. Zhan S, et al. BMC Genomics. 2021 Apr 20;22(1):285. doi: 10.1186/s12864-021-07577-3. BMC Genomics. 2021. PMID: 33874908 Free PMC article.
-
Zhao C, Xie S, Wu H, Luan Y, Hu S, Ni J, Lin R, Zhao S, Zhang D, Li X. Zhao C, et al. Sci Rep. 2019 Apr 19;9(1):6334. doi: 10.1038/s41598-019-42815-5. Sci Rep. 2019. PMID: 31004110 Free PMC article.
-
Tabassum R, Sivadas A, Agrawal V, Tian H, Arafat D, Gibson G. Tabassum R, et al. Genome Med. 2015 Aug 13;7(1):88. doi: 10.1186/s13073-015-0209-4. Genome Med. 2015. PMID: 26391122 Free PMC article.
References
-
- Emilsson V, Thorleifsson G, Zhang B, Leonardson AS, Zink F, Zhu J, Carlson S, Helgason A, Walters GB, Gunnarsdottir S, Mouy M, Steinthorsdottir V, Eiriksdottir GH, Bjornsdottir G, Reynisdottir I, Gudbjartsson D, Helgadottir A, Jonasdottir A, Jonasdottir A, Styrkarsdottir U, Gretarsdottir S, Magnusson KP, Stefansson H, Fossdal R, Kristjansson K, Gislason HG, Stefansson T, Leifsson BG, Thorsteinsdottir U, Lamb JR, et al. Genetics of gene expression and its effect on disease. Nature. 2008;452:423–428. doi: 10.1038/nature06758. - DOI - PubMed
Publication types
MeSH terms
Grants and funding
- DA006227/DA/NIDA NIH HHS/United States
- R01 MH090936/MH/NIMH NIH HHS/United States
- MH090948/MH/NIMH NIH HHS/United States
- MH090936/MH/NIMH NIH HHS/United States
- R01 MH090951/MH/NIMH NIH HHS/United States
- MH090951/MH/NIMH NIH HHS/United States
- R01 MH090937/MH/NIMH NIH HHS/United States
- R01 DA006227/DA/NIDA NIH HHS/United States
- MH090937/MH/NIMH NIH HHS/United States
- HHSN261200800001C/CA/NCI NIH HHS/United States
- MH090941/MH/NIMH NIH HHS/United States
- R01 MH090948/MH/NIMH NIH HHS/United States
- R01 MH090941/MH/NIMH NIH HHS/United States
- HHSN268201000029C/HL/NHLBI NIH HHS/United States
- HHSN261200800001E/CA/NCI NIH HHS/United States
LinkOut - more resources
Full Text Sources
Other Literature Sources