pubmed.ncbi.nlm.nih.gov

Allelic mapping bias in RNA-sequencing is not a major confounder in eQTL studies - PubMed

  • ️Wed Jan 01 2014

Allelic mapping bias in RNA-sequencing is not a major confounder in eQTL studies

Nikolaos I Panousis et al. Genome Biol. 2014.

Abstract

Background: RNA sequencing (RNA-seq) is the current gold-standard method to quantify gene expression for expression quantitative trait locus (eQTL) studies. However, a potential caveat in these studies is that RNA-seq reads carrying the non-reference allele of variant loci can have lower probability to map correctly to the reference genome, which could bias gene quantifications and cause false positive eQTL associations. In this study, we analyze the effect of this allelic mapping bias in eQTL discovery.

Results: We simulate RNA-seq read mapping over 9.5 M common SNPs and indels, with 15.6% of variants showing biased mapping rate for reference versus non-reference reads. However, removing potentially biased RNA-seq reads from an eQTL dataset of 185 individuals has a very small effect on gene and exon quantifications and eQTL discovery. We detect only a handful of likely false positive eQTLs, and overall eQTL SNPs show no significant enrichment for high mapping bias.

Conclusion: Our results suggest that RNA-seq quantifications are generally robust against allelic mapping bias, and that this does not have a severe effect on eQTL discovery. Nevertheless, we provide our catalog of putatively biased loci to allow better controlling for mapping bias to obtain more accurate results in future RNA-seq studies.

PubMed Disclaimer

Figures

Figure 1
Figure 1

Estimation of mapping bias and its effect on expression quantifications. (A) Mapping bias in SNPs and indels of 1000 Genomes estimated by simulated single-end 50 bp RNA-seq reads based on the genome sequence and aligned with BWA. (B) The impact of filtering for reads with simulated mapping bias on exon quantifications (log10 scale).

Figure 2
Figure 2

Correlation of mapping bias with different alignment methods and read types. (A, B) Spearman correlation of reference allele ratio for SNPs (A) and indels (B) in simulations with different mappers and different read building strategies.

Figure 3
Figure 3

Comparison of allelic ratios with different read types. Reference allele ratio obtained from simulated reads over SNPs (A, C) and indels (B, D) comparing genome versus transcriptome-based reads (A, B) in single-end 50 bp reads mapped with the GEM mapper, and comparing single- versus paired-end reads (C, D) in genome-based 50 bp reads mapped with BWA. The dotted lines denote 5% difference of the ratio for reference/non-reference allele. The color scale from dark blue to light blue denote the density of the points.

Figure 4
Figure 4

Effect of mapping bias on eQTL discovery. (A) Comparison of original and filtered P values (rho =0.92, P value <2.2e-16) shows that for the vast majority of the genes, the P values after filtering potentially biased reads are highly consistent with P values without filtering. Colors denote whether the gene with an eQTL in the original dataset was significant only before (lost) or only after (gained) filtering putatively biased reads or in both analyses (common). The dotted lines denote 10% FDR significance thresholds. (B) The number of exons per gene with significant associations as a function of the total number of quantified exons in the original, non-filtered dataset. (C) Proportion of the best-associated exon per gene with genetic variants (see also Additional file 1: Table S5). (D) Proportion of biased variants in six different categories based on the single-end genome based mapped with BWA simulated reads. Matched eQTL null is a random sample of variants matched to the distance from TSS of eQTLs.

Similar articles

Cited by

References

    1. Stranger BE, Nica AC, Forrest MS, Dimas A, Bird CP, Beazley C, Ingle CE, Dunning M, Flicek P, Koller D, Montgomery S, Tavare S, Deloukas P, Dermitzakis ET. Population genomics of human gene expression. Nat Genet. 2007;39:1217–1224. doi: 10.1038/ng2142. - DOI - PMC - PubMed
    1. Emilsson V, Thorleifsson G, Zhang B, Leonardson AS, Zink F, Zhu J, Carlson S, Helgason A, Walters GB, Gunnarsdottir S, Mouy M, Steinthorsdottir V, Eiriksdottir GH, Bjornsdottir G, Reynisdottir I, Gudbjartsson D, Helgadottir A, Jonasdottir A, Jonasdottir A, Styrkarsdottir U, Gretarsdottir S, Magnusson KP, Stefansson H, Fossdal R, Kristjansson K, Gislason HG, Stefansson T, Leifsson BG, Thorsteinsdottir U, Lamb JR, et al. Genetics of gene expression and its effect on disease. Nature. 2008;452:423–428. doi: 10.1038/nature06758. - DOI - PubMed
    1. Brem RB, Yvert G, Clinton R, Kruglyak L. Genetic dissection of transcriptional regulation in budding yeast. Science. 2002;296:752–755. doi: 10.1126/science.1069516. - DOI - PubMed
    1. Schadt EE, Monks SA, Drake TA, Lusis AJ, Che N, Colinayo V, Ruff TG, Milligan SB, Lamb JR, Cavet G, Linsley PS, Mao M, Stoughton RB, Friend SH. Genetics of gene expression surveyed in maize, mouse and man. Nature. 2003;422:297–302. doi: 10.1038/nature01434. - DOI - PubMed
    1. Morley M, Molony CM, Weber TM, Devlin JL, Ewens KG, Spielman RS, Cheung VG. Genetic analysis of genome-wide variation in human gene expression. Nature. 2004;430:743–747. doi: 10.1038/nature02797. - DOI - PMC - PubMed

Publication types

MeSH terms

LinkOut - more resources