A chromosome-level sequence assembly reveals the structure of the Arabidopsis thaliana Nd-1 genome and its gene set - PubMed
- ️Tue Jan 01 2019
A chromosome-level sequence assembly reveals the structure of the Arabidopsis thaliana Nd-1 genome and its gene set
Boas Pucker et al. PLoS One. 2019.
Abstract
In addition to the BAC-based reference sequence of the accession Columbia-0 from the year 2000, several short read assemblies of THE plant model organism Arabidopsis thaliana were published during the last years. Also, a SMRT-based assembly of Landsberg erecta has been generated that identified translocation and inversion polymorphisms between two genotypes of the species. Here we provide a chromosome-arm level assembly of the A. thaliana accession Niederzenz-1 (AthNd-1_v2c) based on SMRT sequencing data. The best assembly comprises 69 nucleome sequences and displays a contig length of up to 16 Mbp. Compared to an earlier Illumina short read-based NGS assembly (AthNd-1_v1), a 75 fold increase in contiguity was observed for AthNd-1_v2c. To assign contig locations independent from the Col-0 gold standard reference sequence, we used genetic anchoring to generate a de novo assembly. In addition, we assembled the chondrome and plastome sequences. Detailed analyses of AthNd-1_v2c allowed reliable identification of large genomic rearrangements between A. thaliana accessions contributing to differences in the gene sets that distinguish the genotypes. One of the differences detected identified a gene that is lacking from the Col-0 gold standard sequence. This de novo assembly extends the known proportion of the A. thaliana pan-genome.
Conflict of interest statement
The authors have declared that no competing interests exist.
Figures

Schematic pseudochromosomes are represented by black lines with positions of genomics features highlighted with colored icons as indicated in the insert.

The dotplot heatmaps show the similarity between small fragments of two sequences. Each dot indicates a match of 1 kbp between both sequences, while the color is indicating the similarity of the matching sequences. A red line highlights an inversion between Nd-1 and Col-0 or Ler and Col-0, respectively. A red arrow points at the position where the inversion alleles differ between Nd-1 and Ler. (a) Comparison of the Nd-1 genome sequence against the Col-0 gold standard sequence reveals a 1 Mbp inversion. (b) The Ler genome sequence displays another inversion allele [14].

There is a very low similarity region (light blue) between the sequences in region A and almost no similarity between the sequences in region B (white). The complete region between 3.29 Mbp and 3.48 Mbp on NdChr2 is missing in the Ler genome.

Differences between the Nd-1 and Col-0 genome sequences lead to the discovery of a collapsed region in the Col-0 gold standard sequence. There are two copies of At4g22214 (blue) present in the Col-0 genome, while only one copy is represented in the Col-0 gold standard sequence. This gene duplication was initially validated through PCR with outwards facing oligonucleotides N258 and N259 (purple) which lead to the formation of the expected PCR product (black). Parts of this region were cloned into plasmids (grey) for sequencing. Sanger and paired-end Illumina sequencing reads revealed one complete gene (At4g22214b) and a degenerated copy (At4g22214a). Moreover, the region downstream of the complete gene copy in Nd-1 indicates the presence of at least one additional degenerated copy.
Similar articles
-
Pucker B, Holtgräwe D, Rosleff Sörensen T, Stracke R, Viehöver P, Weisshaar B. Pucker B, et al. PLoS One. 2016 Oct 6;11(10):e0164321. doi: 10.1371/journal.pone.0164321. eCollection 2016. PLoS One. 2016. PMID: 27711162 Free PMC article.
-
Zapata L, Ding J, Willing EM, Hartwig B, Bezdan D, Jiao WB, Patel V, Velikkakam James G, Koornneef M, Ossowski S, Schneeberger K. Zapata L, et al. Proc Natl Acad Sci U S A. 2016 Jul 12;113(28):E4052-60. doi: 10.1073/pnas.1607532113. Epub 2016 Jun 27. Proc Natl Acad Sci U S A. 2016. PMID: 27354520 Free PMC article.
-
Sahu BB, Sumit R, Srivastava SK, Bhattacharyya MK. Sahu BB, et al. BMC Genomics. 2012 Jan 13;13:20. doi: 10.1186/1471-2164-13-20. BMC Genomics. 2012. PMID: 22244314 Free PMC article.
-
Sequence-based physical mapping of complex genomes by whole genome profiling.
van Oeveren J, de Ruiter M, Jesse T, van der Poel H, Tang J, Yalcin F, Janssen A, Volpin H, Stormo KE, Bogden R, van Eijk MJ, Prins M. van Oeveren J, et al. Genome Res. 2011 Apr;21(4):618-25. doi: 10.1101/gr.112094.110. Epub 2011 Feb 1. Genome Res. 2011. PMID: 21324881 Free PMC article.
-
Shotguns and SNPs: how fast and cheap sequencing is revolutionizing plant biology.
Rounsley SD, Last RL. Rounsley SD, et al. Plant J. 2010 Mar;61(6):922-7. doi: 10.1111/j.1365-313X.2009.04030.x. Plant J. 2010. PMID: 20409267 Review.
Cited by
-
Current status of structural variation studies in plants.
Yuan Y, Bayer PE, Batley J, Edwards D. Yuan Y, et al. Plant Biotechnol J. 2021 Nov;19(11):2153-2163. doi: 10.1111/pbi.13646. Epub 2021 Jul 20. Plant Biotechnol J. 2021. PMID: 34101329 Free PMC article. Review.
-
Comparison of Read Mapping and Variant Calling Tools for the Analysis of Plant NGS Data.
Schilbert HM, Rempel A, Pucker B. Schilbert HM, et al. Plants (Basel). 2020 Apr 2;9(4):439. doi: 10.3390/plants9040439. Plants (Basel). 2020. PMID: 32252268 Free PMC article.
-
Genome-wide analyses supported by RNA-Seq reveal non-canonical splice sites in plant genomes.
Pucker B, Brockington SF. Pucker B, et al. BMC Genomics. 2018 Dec 29;19(1):980. doi: 10.1186/s12864-018-5360-z. BMC Genomics. 2018. PMID: 30594132 Free PMC article.
-
Sasaki E, Kawakatsu T, Ecker JR, Nordborg M. Sasaki E, et al. PLoS Genet. 2019 Dec 30;15(12):e1008492. doi: 10.1371/journal.pgen.1008492. eCollection 2019 Dec. PLoS Genet. 2019. PMID: 31887137 Free PMC article.
-
Plant genome sequence assembly in the era of long reads: Progress, challenges and future directions.
Pucker B, Irisarri I, de Vries J, Xu B. Pucker B, et al. Quant Plant Biol. 2022 Mar 11;3:e5. doi: 10.1017/qpb.2021.18. eCollection 2022. Quant Plant Biol. 2022. PMID: 37077982 Free PMC article. Review.
References
Publication types
MeSH terms
Grants and funding
We acknowledge the financial support of the German Research Foundation (DFG, http://www.dfg.de) to BW (WE1576/16-2). We also acknowledge support for the Article Processing Charge by the DFG and the Open Access Publication Fund of Bielefeld University. The funding bodies did not influence the design of the study, the data collection, the analysis, the interpretation of data, or the writing of the manuscript.
LinkOut - more resources
Full Text Sources
Miscellaneous