pubmed.ncbi.nlm.nih.gov

A chromosome-level sequence assembly reveals the structure of the Arabidopsis thaliana Nd-1 genome and its gene set - PubMed

  • ️Tue Jan 01 2019

A chromosome-level sequence assembly reveals the structure of the Arabidopsis thaliana Nd-1 genome and its gene set

Boas Pucker et al. PLoS One. 2019.

Abstract

In addition to the BAC-based reference sequence of the accession Columbia-0 from the year 2000, several short read assemblies of THE plant model organism Arabidopsis thaliana were published during the last years. Also, a SMRT-based assembly of Landsberg erecta has been generated that identified translocation and inversion polymorphisms between two genotypes of the species. Here we provide a chromosome-arm level assembly of the A. thaliana accession Niederzenz-1 (AthNd-1_v2c) based on SMRT sequencing data. The best assembly comprises 69 nucleome sequences and displays a contig length of up to 16 Mbp. Compared to an earlier Illumina short read-based NGS assembly (AthNd-1_v1), a 75 fold increase in contiguity was observed for AthNd-1_v2c. To assign contig locations independent from the Col-0 gold standard reference sequence, we used genetic anchoring to generate a de novo assembly. In addition, we assembled the chondrome and plastome sequences. Detailed analyses of AthNd-1_v2c allowed reliable identification of large genomic rearrangements between A. thaliana accessions contributing to differences in the gene sets that distinguish the genotypes. One of the differences detected identified a gene that is lacking from the Col-0 gold standard sequence. This de novo assembly extends the known proportion of the A. thaliana pan-genome.

PubMed Disclaimer

Conflict of interest statement

The authors have declared that no competing interests exist.

Figures

Fig 1
Fig 1. Nd-1 genome structure.

Schematic pseudochromosomes are represented by black lines with positions of genomics features highlighted with colored icons as indicated in the insert.

Fig 2
Fig 2. Inversion on chromosome 4.

The dotplot heatmaps show the similarity between small fragments of two sequences. Each dot indicates a match of 1 kbp between both sequences, while the color is indicating the similarity of the matching sequences. A red line highlights an inversion between Nd-1 and Col-0 or Ler and Col-0, respectively. A red arrow points at the position where the inversion alleles differ between Nd-1 and Ler. (a) Comparison of the Nd-1 genome sequence against the Col-0 gold standard sequence reveals a 1 Mbp inversion. (b) The Ler genome sequence displays another inversion allele [14].

Fig 3
Fig 3. Highly divergent region on chromosome 2.

There is a very low similarity region (light blue) between the sequences in region A and almost no similarity between the sequences in region B (white). The complete region between 3.29 Mbp and 3.48 Mbp on NdChr2 is missing in the Ler genome.

Fig 4
Fig 4. Hidden locus in the Col-0 reference sequence.

Differences between the Nd-1 and Col-0 genome sequences lead to the discovery of a collapsed region in the Col-0 gold standard sequence. There are two copies of At4g22214 (blue) present in the Col-0 genome, while only one copy is represented in the Col-0 gold standard sequence. This gene duplication was initially validated through PCR with outwards facing oligonucleotides N258 and N259 (purple) which lead to the formation of the expected PCR product (black). Parts of this region were cloned into plasmids (grey) for sequencing. Sanger and paired-end Illumina sequencing reads revealed one complete gene (At4g22214b) and a degenerated copy (At4g22214a). Moreover, the region downstream of the complete gene copy in Nd-1 indicates the presence of at least one additional degenerated copy.

Similar articles

Cited by

References

    1. Koornneef M, Meinke D. The development of Arabidopsis as a model plant. The Plant Journal. 2010;61(6):909–21. 10.1111/j.1365-313X.2009.04086.x - DOI - PubMed
    1. The Arabidopsis Genome Initiative. Analysis of the genome sequence of the flowering plant Arabidopsis thaliana. Nature. 2000;408(6814):796–815. 10.1038/35048692 - DOI - PubMed
    1. Kumekawa N, Hosouchi T, Tsuruoka H, Kotani H. The size and sequence organization of the centromeric region of arabidopsis thaliana chromosome 5. DNA Research. 2000;7(6):315–21. 10.1093/dnares/7.6.315 - DOI - PubMed
    1. Long Q, Rabanal FA, Meng D, Huber CD, Farlow A, Platzer A, et al. Massive genomic variation and strong selection in Arabidopsis thaliana lines from Sweden. Nature Genetics. 2013;45(8):884–90. Epub 2013 Jun 23. 10.1038/ng.2678 - DOI - PMC - PubMed
    1. Vukašinović N, Cvrčková F, Eliáš M, Cole R, Fowler JE, Žárský V, et al. Dissecting a hidden gene duplication: the Arabidopsis thaliana SEC10 locus. PLoS ONE. 2014;9(4):e94077 10.1371/journal.pone.0094077 - DOI - PMC - PubMed

Publication types

MeSH terms

Grants and funding

We acknowledge the financial support of the German Research Foundation (DFG, http://www.dfg.de) to BW (WE1576/16-2). We also acknowledge support for the Article Processing Charge by the DFG and the Open Access Publication Fund of Bielefeld University. The funding bodies did not influence the design of the study, the data collection, the analysis, the interpretation of data, or the writing of the manuscript.