pubmed.ncbi.nlm.nih.gov

A homology-guided, genome-based proteome for improved proteomics in the alloploid Nicotiana benthamiana - PubMed

  • ️Tue Jan 01 2019

A homology-guided, genome-based proteome for improved proteomics in the alloploid Nicotiana benthamiana

Jiorgos Kourelis et al. BMC Genomics. 2019.

Abstract

Background: Nicotiana benthamiana is an important model organism of the Solanaceae (Nightshade) family. Several draft assemblies of the N. benthamiana genome have been generated, but many of the gene-models in these draft assemblies appear incorrect.

Results: Here we present an improved proteome based on the Niben1.0.1 draft genome assembly guided by gene models from other Nicotiana species. Due to the fragmented nature of the Niben1.0.1 draft genome, many protein-encoding genes are missing or partial. We complement these missing proteins by similarly annotating other draft genome assemblies. This approach overcomes problems caused by mis-annotated exon-intron boundaries and mis-assigned short read transcripts to homeologs in polyploid genomes. With an estimated 98.1% completeness; only 53,411 protein-encoding genes; and improved protein lengths and functional annotations, this new predicted proteome is better in assigning spectra than the preceding proteome annotations. This dataset is more sensitive and accurate in proteomics applications, clarifying the detection by activity-based proteomics of proteins that were previously predicted to be inactive. Phylogenetic analysis of the subtilase family of hydrolases reveal inactivation of likely homeologs, associated with a contraction of the functional genome in this alloploid plant species. Finally, we use this new proteome annotation to characterize the extracellular proteome as compared to a total leaf proteome, which highlights the enrichment of hydrolases in the apoplast.

Conclusions: This proteome annotation provides the community working with Nicotiana benthamiana with an important new resource for functional proteomics.

Keywords: Genome annotation; Nicotiana benthamiana; Proteomics; Solanaceae; Subtilases.

PubMed Disclaimer

Conflict of interest statement

The authors declare that they have no competing interests.

Figures

Fig. 1
Fig. 1

Bioinformatics pipeline for improved Nicotiana benthamiana proteome annotation. The predicted proteins of Nicotiana species generated by the NCBI Eukaryotic Genome Annotation Pipeline were retrieved from Genbank and clustered at 95% identity threshold to reduce redundancy (Step 1), and used to annotate the Niben1.0.1 genome assembly (Step 2). Only those proteins with an alignment coverage ≥60% to the Nicotiana predicted proteins as determined by BLASTP were retained (Step 3) to produce the NbD core dataset. Similarly, the other draft genome assemblies were annotated (Step 4), and only those proteins with an alignment coverage ≥90% to the Nicotiana predicted proteins as determined by BLASTP were retained (Step 5). CD-HIT-2D was used at 100% sequence identity to retain proteins missing in NbD dataset (Step 6), resulting in supplemental dataset NbE. NbD and NbE can be combined (NbDE) to maximise the spectra annotation for proteomics experiments

Fig. 2
Fig. 2

Increased lengths, coverage and annotation of N. benthamiana proteins. a NbD/NbDE datasets have relatively few entries when compared to preceding datasets. b NbD/NbDE datasets contain nearly all benchmark genes as full-length genes, according to Benchmarking Universal Single-Copy Orthologs (BUSCO) of embryophyta. c The NbD/NbDE datasets have higher number of annotated PFAM domains. d NbD/NbDE datasets have relatively longer protein lengths. Violin and boxplot graph of log10 protein length distribution of each dataset. Jittered dots show the raw underlying data. e NbD/NbDE annotated proteins have a higher percentage coverage to the tomato proteins as determined by BLASTP

Fig. 3
Fig. 3

NbD/NbDE datasets outperform the annotation of spectra in proteomics. a Percentage of annotated MS/MS spectra in total leaf extract samples. b Average number of unique peptides assigned per protein in the different datasets. a and b Means and standard error of the mean are shown for four biological replicates of total leaf extracts. c Mis-annotations of papain-like Cys proteases (PLCPs) detected by activity-based protein profiling [54]. Leaf extracts were labelled with activity-based probes for PLCPs and labelled proteins were purified and analysed by MS. Shown are the protein annotations found in the NbDE (top) Niben1.0.1 (middle) and curated datasets (bottom), highlighting mis-annotations (red) caused by partial transcripts, mis-annotation of exon-intron boundaries, and mis-assemblies

Fig. 4
Fig. 4

Examples of subtilase mis-annotations in the different genome assemblies. a Gene-models corresponding to subtilase NbE05066806 and the corresponding annotations in the various datasets. This subtilase gene is fragmented in Niben1.0.1; truncated in Nbv0.5; and carries two SNPs and an extra sequence in Niben0.4.4. b Gene-models corresponding to subtilase NbE03059263 and the corresponding annotations in the various datasets. This subtilase has an inactivated homeolog (dark grey) that was not retained in the NbDE dataset as it encodes a protein with < 60% coverage because it contains premature stop codons. The truncated proteins caused mis-assembly in the Niben1.0.1 dataset, resulting in a hybrid sequence. Mis-annotated exon-intron boundaries also effected gene models in Niben1.0.1, Niben0.4.4 and Nbv5.1. Peptides matched to the different gene models are indicated below the gene models

Fig. 5
Fig. 5

Birth and death of subtilase paralogs in N. benthamiana. The evolutionary history of the subtilase gene family was inferred by using the Maximum Likelihood method based on the Whelan and Goldman model. The bootstrap consensus tree inferred from 500 replicates is taken to represent the evolutionary history of the taxa analysed. Non-functional subtilases are indicated in grey. Subtilases identified in apoplastic fluid (AF) and/or total extract (TE) are indicated with yellow and green dots, respectively. Naming of subtilase clades is according to [51]. Additional file 1: Figure S2 includes the individual names

Fig. 6
Fig. 6

Annotation of the N. benthamiana apoplastic proteome. a Correlation matrix heat map of the log2 transformed LFQ intensity of protein groups in the four biological replicates of apoplastic fluid (AF) and total extract (TE) samples. Biological replicates are clustered on similarity. b A volcano plot is shown plotting log2 fold difference (log2FC) of AF/TE over –log10 BH-adjusted moderated p-values. Proteins with log2FC ≥ 1.5 and p ≤ 0.01 were considered apoplastic, as well as those only found in AF. Conversely, proteins with a log2FC ≤ 1.5 and p ≤ 0.01 were considered intracellular, as well as those found only in TE. c Percentage of proteins in each fraction annotated with biological process-associated GO-SLIM terms. d Percentage of proteins in each fraction annotated with molecular function-associated GO-SLIM terms. c and d GO-SLIM annotations are shown when significantly enriched or depleted (BH-adjusted hypergeometric test, p < 0.05) in at least one of the fractions (AF, TE, or both). Each bubble indicates the percentage of genes containing that specific GO-SLIM annotation in that compartment. Colours indicate whether the GO-SLIM annotations are enriched or depleted in that fraction (p < 0.05, n.s., non-significant).

Similar articles

Cited by

References

    1. Bally J, Nakasugi K, Jia F, Jung H, Ho SYW, Wong M, Paul CM, Naim F, Wood CC, Crowhurst RN, Hellens RP, Dale JL, Waterhouse PM. The extremophile Nicotiana benthamiana has traded viral defence for early vigour. Nat Plants. 2015;1:15165. doi: 10.1038/nplants.2015.165. - DOI - PubMed
    1. Bombarely A, Moser M, Amrad A, Bao M, Bapaume L, Barry CS, Bliek M, Boersma MR, Borghi L, Bruggmann R, Bucher M, D’Agostino N, Davies K, Druege U, Dudareva N, Egea-Cortines M, Delledonne M, Fernandez-Pozo N, Franken P, Grandont L, Heslop-Harrison JS, Hintzsche J, Johns M, Koes R, Lv X, Lyons E, Malla D, Martinoia E, Mattson NS, Morel P, Mueller LA, Muhlemann J, Nouri E, Passeri V, Pezzotti M, Qi Q, Reinhardt D, Rich M, Richert-Pöggeler KR, Robbins TP, Schatz MC, Schranz ME, Schuurink RC, Schwarzacher T, Spelt K, Tang H, Urbanus SL, Vandenbussche M, Vijverberg K, Villarino GH, Warner RM, Weiss J, Yue Z, Zethof J, Quattrocchio F, Sims TL, Kuhlemeier C. Insight into the evolution of the Solanaceae from the parental genomes of Petunia hybrida. Nat Plants. 2016;2:16074. doi: 10.1038/nplants.2016.74. - DOI - PubMed
    1. Bombarely A, Rosli HG, Vrebalov J, Moffett P, Mueller LA, Martin GB. A draft genome sequence of Nicotiana benthamiana to enhance molecular plant-microbe biology research. Mol Plant-Microbe Interact. 2012;25:1523–1530. doi: 10.1094/MPMI-06-12-0148-TA. - DOI - PubMed
    1. Casimiro-Soriguer CS, Muñoz-Mérida A, Pérez-Pulido AJ. Sma3s: a universal tool for easy functional annotation of proteomes and transcriptomes. Proteomics. 2017;17. 10.1002/pmic.201700071. - PubMed
    1. Clarkson JJ, Dodsworth S, Chase MW. Time-calibrated phylogenetic trees establish a lag between polyploidisation and diversification in Nicotiana (Solanaceae). Plant Syst Evol. 2017. 10.1007/s00606-017-1416-9.

MeSH terms

Substances