pmc.ncbi.nlm.nih.gov

Novel Cell-Virus-Virophage Tripartite Infection Systems Discovered in the Freshwater Lake Dishui Lake in Shanghai, China


Virophages are small parasitizing viruses of large/giant viruses. To our knowledge, the few isolated virophages all parasitize giant protozoan viruses (Mimiviridae) for propagation and form a tripartite infection system with hosts, here named the cell-virus-virophage (CVv) system. However, the CVv system remains largely unknown in environmental metagenomic data sets. In this study, we systematically investigated the metagenomic data set from the freshwater lake Dishui Lake, Shanghai, China. Consequently, four novel large alga viruses and seven virophages were discovered to coexist in Dishui Lake. Surprisingly, a novel CVv tripartite infection system comprising green algae, large green alga viruses (Phycodnaviridae- and Mimiviridae-related), and virophages was identified based on genetic link, genomic signature, and CRISPR system analyses. Meanwhile, a nonhomologous CRISPR-like system was found in Dishui Lake large alga viruses, which appears to protect the virus host from the infection of Dishui Lake virophages (DSLVs). These findings are critical to give insight into the potential significance of CVv in global evolution and ecology.

KEYWORDS: virophage, large green alga virus, Phycodnaviridae, Mimiviridae, CRISPR-Cas-like system

ABSTRACT

Virophages are small parasitic double-stranded DNA (dsDNA) viruses of giant dsDNA viruses infecting unicellular eukaryotes. Except for a few isolated virophages characterized by parasitization mechanisms, features of virophages discovered in metagenomic data sets remain largely unknown. Here, the complete genomes of seven virophages (26.6 to 31.5 kbp) and four large DNA viruses (190.4 to 392.5 kbp) that coexist in the freshwater lake Dishui Lake, Shanghai, China, have been identified based on environmental metagenomic investigation. Both genomic and phylogenetic analyses indicate that Dishui Lake virophages (DSLVs) are closely related to each other and to other lake virophages, and Dishui Lake large DNA viruses are affiliated with the micro-green alga-infecting Prasinovirus of the Phycodnaviridae (named Dishui Lake phycodnaviruses [DSLPVs]) and protist (protozoan and alga)-infecting Mimiviridae (named Dishui Lake large alga virus [DSLLAV]). The DSLVs possess more genes with closer homology to that of large alga viruses than to that of giant protozoan viruses. Furthermore, the DSLVs are strongly associated with large green alga viruses, including DSLPV4 and DSLLAV1, based on codon usage as well as oligonucleotide frequency and correlation analyses. Surprisingly, a nonhomologous CRISPR-Cas like system is found in DSLLAV1, which appears to protect DSLLAV1 from the parasitization of DSLV5 and DSLV8. These results suggest that novel cell-virus-virophage (CVv) tripartite infection systems of green algae, large green alga virus (Phycodnaviridae- and Mimiviridae-related), and virophage exist in Dishui Lake, which will contribute to further deep investigations of the evolutionary interaction of virophages and large alga viruses as well as of the essential roles that the CVv plays in the ecology of algae.

IMPORTANCE Virophages are small parasitizing viruses of large/giant viruses. To our knowledge, the few isolated virophages all parasitize giant protozoan viruses (Mimiviridae) for propagation and form a tripartite infection system with hosts, here named the cell-virus-virophage (CVv) system. However, the CVv system remains largely unknown in environmental metagenomic data sets. In this study, we systematically investigated the metagenomic data set from the freshwater lake Dishui Lake, Shanghai, China. Consequently, four novel large alga viruses and seven virophages were discovered to coexist in Dishui Lake. Surprisingly, a novel CVv tripartite infection system comprising green algae, large green alga viruses (Phycodnaviridae- and Mimiviridae-related), and virophages was identified based on genetic link, genomic signature, and CRISPR system analyses. Meanwhile, a nonhomologous CRISPR-like system was found in Dishui Lake large alga viruses, which appears to protect the virus host from the infection of Dishui Lake virophages (DSLVs). These findings are critical to give insight into the potential significance of CVv in global evolution and ecology.

INTRODUCTION

Viruses are obligate parasites of Eukarya, Archaea, and Bacteria. The discovery of Acanthamoeba polyphaga Mimivirus (APMV) (1) and the other giant viruses of protozoa bring upheaval to the definition of viruses (211). Coincidently, in 2008, a small double-stranded DNA (dsDNA) virus of 50-nm icosahedral virions was observed together with the giant mamavirus (Acanthamoeba castellanii mamavirus [ACMV]), the sister strain of APMV, in the viral factory and cytoplasm of an amoeba host (12). It was defined as Sputnik virophage because of its negative impact on its giant virus host, such as the increase of abnormal viral particles and decrease in infectivity or lytic ability of the giant virus. Shortly after that, Sputnik 2 (13) and 3 (14) were isolated, which share more than 99% genomic sequence identity with Sputnik and have broad virus host ranges among the Mimiviridae. In 2011, a new Mavirus (15) virophage, associated with giant Cafeteria roenbergensis virus (CroV) (16), which infects a marine heterotrophic protozoon, was discovered. Soon the Zamilon virophage (17) was found to be associated with the giant Mont1 virus in Acanthamoeba polyphaga cells. The continued identification of virophages that are involved in the Mimiviridae virus and protozoan hosts indicates that the cell-virus-virophage (CVv) tripartite infection systems are far more diverse and universal than expected.

Interestingly, Yau and coauthors (18) found the existence of a virophage genome (Organic Lake virophage [OLV]) and two large alga virus genomes (Organic Lake phycodnaviruses [OLPVs]) in the metagenome from Organic Lake (extremely salty water) in eastern Antarctica. The OLV appears to have a huge impact on the replication of OLPVs and the growth of algae based on Lotka-Volterra simulation. Meanwhile, in our previous work, the complete genomes of seven novel virophages (Yellowstone Lake virophages [YSLVs]) (19, 20) along with four novel algal viruses (Yellowstone Lake phycodnaviruses [YSLPVs] and Yellowstone Lake giant virus [YSLGV]) (21) were discovered from the Yellowstone Lake (hydrothermal freshwater lake) metagenomes; the nearly complete sequence of a virophage (Ace Lake Mavirus [ALM]) (19) was obtained in the hypersaline meromictic Ace Lake in Antarctica. Moreover, we demonstrate the global distribution, abundance, and genetic diversity of virophages by comprehensively analyzing global environmental metagenomic data sets (19). These interesting findings suggest that additional as-yet-uncharacterized CVv tripartite infection systems are distributed widely in the natural environments, especially in the freshwater ecosystem. Subsequently, a virophage-like sequence (Phaeocystis globosa virus virophage [PgVV]) (22) was identified in the genome of Phaeocystis globosa virus PgV-16T (22), which highlights the possibility of a large alga virus serving as a viral host for virophages. Additionally, phylogenetic analysis of shared core genes places the PgV-16T, YSLGV, and OLPVs into the group of alga-infecting mimiviruses, referred to as the Mimiviridae group III, which is closely related to the protozoan-infecting Mimiviridae groups I and II, in which the CVv tripartite infection system was first identified. Recently, the virophage-like sequence elements were detected and confirmed to transcribe in chlorarachniophyte alga Bigelowiell anatans strains (23). Taken together, these results suggest that large/giant alga viruses may largely be involved in CVv tripartite infection systems in addition to the known giant protozoan virus-based CVv systems, such as Acanthamoeba polyphaga-ACMV-Sputnik, Cafeteria roenbergensis-CroV-Mavirus, and Acanthamoeba polyphaga-Mont1 virus-Zamilon.

To verify this hypothesis, we previously discovered the complete genomic sequence of a large alga virus (Dishui Lake phycodnavirus 1 [DSLPV1]) (24) as well as a virophage (Dishui Lake virophage 1 [DSLV1]) (25) in the metagenomic data sets from the freshwater lake Dishui Lake (DSL), Shanghai, China, which indicates that DSL is the right place to discover the tripartite infection systems. In this study, first, four novel large alga viruses and seven virophages were discovered from DSL based on metagenomic, genomic, and phylogenetic analyses. Subsequently, they were subjected to genetic link and genomic signature analyses together with the known large/giant viruses and virophages. Finally, a unique CRISPR-Cas-like system was identified in the large alga virus from DSL, which seems to protect the virus host from infection with DSL virophages. All these findings suggest the presence of novel CVv tripartite infection systems comprising green algae, large green alga viruses (Phycodnaviridae- and Mimiviridae-related), and virophages in DSL, which paves the way for deep investigation of their interaction mechanisms and coevolution as well as of their potential significance in global ecology.

RESULTS

Genomic architecture of DSLVs.

A total of 780,575 contigs, 200 to 197,862 bp long, were obtained after de novo assembly. Among them, 12 distinct contigs were identified to be similar to known virophage major capsid proteins (MCPs) (E value, 10−5), which indicates a high diversity of virophages in DSL. Finally, the complete genomic sequences of seven virophages were successfully assembled from the DSL metagenomic data sets. They are named Dishui Lake virophages 2, 3, 4, 5, 6, 7, and 8 (DSLV2-8), and their sequence lengths are 31,238, 31,512, 30,873, 26,593, 28,714, 29,961, and 26,605 bp, respectively, with a G+C content ranging from 32.3 to 45.2% (Fig. 1). DSLVs encode, respectively, 38 (DSLV2), 31 (DSLV3), 32 (DSLV4), 25 (DSLV5), 29 (DSLV6), 30 (DSLV7), and 25 (DSLV8) predicted open reading frames ORFs (see Tables S1 to S7 in the supplemental material), and DSLV2 contains the most ORFs in comparison to all other DSLVs and known virophages. All these virophage genomes harbor a circular genome, except DSLV3 (Fig. 1). The DSLV3 genome contains palindromic repeats at the asymmetric position of two ends of the genome, which resembles that found in large/giant viruses containing a linear genome, such as Mimivirus (26).

FIG 1.

FIG 1

Physical maps of genomes of the DSLVs. Numbers outside the genomes indicate nucleotide positions. ORFs are indicated in arrows and labeled in different colors that represent different functional categories. Numbers on the ORFs denote the order of ORFs. The inner zigzag blue line denotes the G+C content of the genomes. Viral names and genomic length are shown in the center of the map. The linear genome of DSLV3 is shown in open circle.

Four virophage core genes of the packaging ATPase, the cysteine protease (PRO), the major capsid protein (MCP), and the minor capsid protein (mCP), along with one virophage conserved gene of the putative DNA helicase (HEL) were identified in DSLV2 to 8, with the exception of DSLV2 to 3 lacking the putative DNA helicase.

DSLVs share more homolog counterparts with freshwater virophages, especially YSLV3/4 and Mendota 1002791/23267002401 from Yellowstone Lake and Mendota Lake, respectively (Fig. 2A), suggesting a close relationship between them.

FIG 2.

FIG 2

(A) Heatmap showing homologous genes shared between DSLVs and other known virophages. Rows represent the homologous genes of DSLVs, and columns are the homologous genes of other known virophages. Rectangles in gradient red color indicate different amino acid identities of the homologous genes shared between DSLVs and other known virophages. (B) Maximum-likelihood phylogenetic tree of DSLVs. Bootstrap values (larger than 50) are indicated on each node. Isolated and assembled genomes (environmental metagenome) are indicated in solid and dashed lines, respectively. DSLVs are shown in boldface. Scale bar, 0.5 amino acid substitutions per site. OLV, Organic Lake virophage; QLV, Qinghai Lake virophage; TBE, Trout Bog epilimnion; TBH, Trout Bog hypolimnion; YSLV, Yellowstone Lake virophage; RNV: Rio Negro virophage.

Intriguingly, a set of homologous genes was shared between DSLVs and the Phycodnaviridae viruses, belonging to the nucleo-cytoplasmic large DNA viruses (NCLDVs), which are the parasites of eukaryotic algae. For example, the DSLV6 ORF15 shared 51% amino acid identity with the repeat domain-containing glycoprotein from the large green alga virus Paramecium bursaria Chlorella virus OR0704.2.2 (Table S5). However, none of the DSLVs shared genes with close homology to that of the known Mimiviridae viruses infecting protozoa.

Phylogenetic analysis of DSLVs.

According to the tree topology (Fig. 2B), DSLVs were grouped into three distinct clades with strong bootstrap value support (94 to 100%) along with other lake virophages, e.g., YSLVs and Mendota virophages. Clearly, these three pairs of DSLV1/7, DSLV4/6, and DSLV5/8 are each other’s closest relatives, and they are much more closely affiliated with each other than with DSLV2 and 3. These findings are in good agreement with the heatmap results shown in Fig. 2A. All DSLVs were distantly related to the virophages parasitizing giant protozoan viruses, suggesting an early divergent evolution as well as different CVv infection systems.

Genomic architecture of DSL large alga viruses.

These 780,757 contigs of DSL metagenomic assemblies were used to identify large alga virus-related contigs with BLASTp search and MEGAN clustering analysis. As a result, 17 contigs (length, >10 kb) were confirmed to be large alga virus-related sequences (Table 1). Finally, four complete genomic sequences of large alga viruses were assembled from the DSL metagenomic data sets. They are named Dishui Lake phycodnavirus 2 (DSLPV2; 190,426 bp), Dishui Lake phycodnavirus 3 (DSLPV3; 194,169 bp), Dishui Lake phycodnavirus 4 (DSLPV4; 193,666 bp), and Dishui Lake large alga virus 1 (DSLLAV1; 392,531 bp).

TABLE 1.

Large/giant alga virus-related contigs

Contig IDa Sequence length (bp) G+C content (%)
Contig 07011 18,451 36.1
Contig 03034 13,453 36.2
Contig 14604 15,358 36.9
Contig 04946 28,336 38.4
Contig 07784 17,218 44.3
Contig 11676 15,108 41.3
Contig 08686 11,383 41.9
Contig 25154 10,133 42.0
Contig 04652 17,126 43.4
Contig 02445 26,955 45.6
Contig 02442 16,322 45.7
Contig 13390 14,906 46.8
Contig 10213 19,415 47.2
Contig 02448 26,811 47.5
Contig 02230 10,107 48.5
Contig 04138 26,394 40.9
Contig 04263 24,815 55.0

Reverse repetitive sequences were identified at both ends of the genomes of DSLPV2 to 3, indicating complete and linear genomes, and DSLPV4 contains a circular genome (Fig. 3). The genomes of DSLPV2, 3, and 4 have a G+C content of 47.6%, 46.5%, and 38.3%, and encode 247, 236, and 220 predicted ORFs, respectively. The BLASTp search showed that 188 ORFs (∼76%) in DSLPV2, 204 ORFs (∼86%) in DSLPV3, and 172 ORFs (78%) in DSLPV4 shared similarity with proteins in the NCBI nonredundant (nr) database, and most of these ORFs are related to large/giant viruses in NCLDVs, especially Prasinovirus in Phycodnaviridae (164, 190, and 149, respectively, in DSLPV2, 3, and 4), indicating a close relationship of DSLPV2 to 4 with the Prasinovirus members (Tables S8 to S10).

FIG 3.

FIG 3

Physical maps of genomes of DSLPVs. The outside numbers denote the position of the nucleotide. ORFs are indicated in arrows. ORFs in different colors represent different taxonomic categories of BLASTp top hits. Their numbers are shown in the center pie charts. The inner zigzag blue lines denote the G+C content of the genomes. The linear genomes are shown in open circles, and the terminal inverted repeats of the linear genomes are indicated with orange arrows.

Meanwhile, DSLPV2 ORF7 and ORF237 were homologous to genes in Ostreococcus tauri (45% identity) and Raphidocelis subcapitata (36% identity), respectively; DSLPV3 ORF81 was homologous to genes in Micromonas commode (46% identity); DSLPV4 ORF14 and ORF114 were homologous to genes in Ostreococcus tauri (44% and 35% identity) (Tables S8 to S10). In addition, five tRNA genes were identified in the DSLPV2 genome, seven were identified in DSLPV3, and four were identified in DSLPV4.

Unlike DSLPVs, DSLLAV1 has a 392,531-bp-long circular genome. A total of 366 ORFs were predicted on both DNA strands through the genome. A BLASTp search showed that 158 ORFs (∼43%) shared no similarity with known proteins in the nr databases, indicating a novel large virus of DSLLAV1. Moreover, 104 ORFs (∼28%) in DSLLAV1 were similar to known viral proteins, 66 (∼18%) were similar to eukaryotic proteins, 32 ORFs (∼9%) were similar to bacterial proteins, and 6 ORFs (∼2%) were similar to archaeal proteins (Fig. 4A). Notably, 25 out of these 66 eukaryotic hits were from Tetrabaena socialis, which belongs to the green algae Chlorophyta (Table S11). Moreover, over 40% (44 out of 104) of viral top hits of DSLLAV1 ORFs were more similar to those from large/giant viruses in the alga-infecting Mimiviridae group III, especially to those from the Tetraselmis virus 1 (16 ORFs), which infects Tetrabaena socialis, and Tetraselmis members within Chlorophyta (Fig. 4B). Coincidently, the genomes of both DSLLAV1 and Tetraselmis virus 1 are circular DNA molecules with almost identical G+C content (41.7% for DSLLAV1 and 41.2% for Tetraselmis virus 1). Additionally, 19 ORFs in DSLLAV1 are homologous to those from viruses in the Phycodnaviridae family (Fig. 4B), most of them infecting Chlorophyta. Taken together, these results indicate that DSLLAV1 is more closely related to the Mimiviridae viruses that infect algae, especially green algae, and a Chlorophyta green alga is likely the host of DSLLAV1.

FIG 4.

FIG 4

(A) Physical map of the DSLLAV1 genome. The outside numbers denote the position of nucleotide. ORFs are indicated in arrows. ORFs in different colors represent different taxonomic categories of BLASTp top hits. Their numbers are shown in the center pie chart. The inner zigzag blue line denotes the G+C content of the genomes. (B) Taxonomic hierarchy of viral top hits of ORFs. Circles from the inside to the outside denote the decreasing order of taxonomic hierarchy from the highest to the lowest rank. Numbers in parentheses indicate the number of ORFs. PBCV, Paramecium bursaria Chlorella virus; YSLPV, Yellowstone Lake phycodnavirus; YSLGV, Yellowstone Lake giant virus; OtV, Ostreococcus tauri virus; DSLPV, Dishui Lake phycodnavirus; MpV, Micromonas pusilla virus; BpV, Bathycoccus sp. strain RCC1105 virus; EhV, Emiliania huxleyi virus; TetV, Tetraselmis virus; CeV, Chrysochromulina ericina virus; PgV, Phaeocystis globosa virus; AaV, Aureococcus anophagefferens virus; OLPV, Organic Lake phycodnavirus; PoV, Pyramimonas orientalis virus.

Phylogenetic analysis of DSLPVs and DSLLAV1.

DSLPV1, described previously (24), was also included in the phylogenetic analysis. As shown in Fig. 5, DSLPV3 was grouped in the genus Prasinovirus, infecting marine micro-eukaryotic green algae, in the family of algae-infecting Phycodnaviridae. Moreover, most genes (190 out of 204 ORFs) of DSLPV3 are homologous to those of prasinoviruses rather than to those of other Phycodnaviridae viruses, and DSLPV3 shared overall high genomic sequence similarity with prasinoviruses based on Mauve and BRIGE alignment analyses (Fig. 6). These results suggest a new viral species of DSLPV3 in the Prasinovirus genus. In contrast, DSLPV2 and 4 were closely related to the genus Prasinovirus, currently representing the closest relatives of Prasinovirus, which together with DSLPV1 share a common ancestor with Yellowstone Lake phycodnaviruses with 100% bootstrap support.

FIG 5.

FIG 5

The maximum-likelihood phylogenetic tree of DSLPVs and DSLLAV1. Green branches and type represent large micro-green algae viruses, and the dotted lines denote the viruses found in the metagenomic data sets. The other large algae viruses are indicated by brown branches and type. Blue branches and type denote giant protozoa viruses. DSLPVs and DSLLAV1 are shown in bold. Circles ahead of viral names indicate circular genomes, and the rest are linear genomes. Scale bar, 0.2 amino acid substitutions per site. PV, Prasinovirus; CV, Chlorovirus; I, Mimiviridae group I; II, Mimiviridae group II; III, Mimiviridae group III. Bootstrap values (100 iterations) larger than 50 are shown on the branching node of the tree.

FIG 6.

FIG 6

(A) Multiple genome alignment of DSLPV1, DSLPV2, DSLPV3, DSLPV4, BpV1, MpV1, OtV1, and OtV1 (for full names, refer to Fig. 4). Conserved sequence blocks are shown in the same color and linked across the genomes with the lines in the same color as the corresponding sequence blocks. (B) Nucleotide identity plot of genomic sequences shared between DSLPV3 and each specific virus (DSLPV1, DSLPV2, DSLPV4, BpV1, MpV1, OtV1, and OtV1) based on BLASTn. The inner circle in turquoise represents the DSLPV3 genome, and different colored circles toward the outside represent the other viral genomes.

Surprisingly, DSLLAV1 represented a deeply branching lineage of the protozoan-infecting Mimiviridae viruses and the Mimiviridae-related alga-infecting large viruses (Mimiviridae group III), including OLPVs and PgV-16T (Fig. 5). This suggests that DSLLAV1 appears to be an early-diverged Mimiviridae virus, which is in agreement with the results of the gene content analyses of DSLLAV1 described above.

Genetic links between DSLVs and large/giant viruses.

A total of ten and five ORFs (Fig. 7A) in DSLVs (DSLV1 to 8) were detected to be homologous genes of known large/giant alga viruses (amino acid identity, 24% to 51%) and DSLPV3 to 4 and DSLLAV1 (amino acid identity, 21% to 45%), respectively, after removing redundancy (Table 2). In contrast, although five genes were shared between DSLVs and known giant protozoan viruses, they were all distantly related homologous genes (Fig. 7A). Importantly, four DSLV ORFs (DSLV1-25, DSLV4-10, DSLV6-15, and DSLV7-5) were found to be the closest homolog counterparts to those of large/giant green alga viruses of Paramecium bursaria Chlorella virus and DSLPV4 (identity, >40%; E value, <10−20) (Fig. 7A). In contrast, however, no DSLV ORFs were found to be the closest homolog counterparts to those of giant protozoa viruses. Further phylogenetic analysis of DSLV6 ORF15, one of the above-listed four closest homologous genes to large/giant alga viruses, indicated that it was affiliated with ORFs of large green alga viruses and not to giant protozoan viruses (Fig. 7C).

FIG 7.

FIG 7

Genetic links between virophages and large/giant viruses. (A) Scatterplots of homologous genes shared between DSLVs and large/giant protist viruses, including DSLPVs/DSLLAV1. Dots in different colors denote homologous genes that are shared between DSLVs and large/giant protist viruses after removing redundancy. Triangles denote the closest homologous genes without removing redundancy shared between DSLVs and large/giant algae viruses. The names of specific virophages and large/giant algae viruses are shown in the text tags. (B) Scatterplots of homologous genes shared between known virophages (Sputnik, Mavirus, Zamilon, and Rio Negro virophages) and large/giant protist viruses. Dots in different colors denote homologous genes that are shared between known virophages and large/giant protist viruses after removing redundancy, while triangles denote the closest homologous genes with redundancy shared between known virophages and giant protozoa viruses. The names of specific virophages and giant protozoan viruses are shown in the text tags. (C) Neighbor-joining phylogenetic tree of the homologous gene (DSLV6 ORF15) shared between DSLVs and large/giant protist viruses, including DSLPV4. Green branches and type denote large micro-green algae viruses. Blue branches and type represent giant protozoa viruses. DSLV6 is shown in black branch and fonts. Dotted lines denote the viruses found in the metagenomic data sets. Bootstrap values (100 iterations) larger than 50 are shown on the branching node of the tree. (D) Neighbor-joining phylogenetic tree of the homologous gene (Sputnik V6) shared between known virophages (Sputnik and Zamilon) and large/giant protist viruses. Blue branches and type denote giant protozoan viruses. Green branches and type denote large micro-green algae viruses. Sputnik and Zamilon are indicated with black branches and fonts. Bootstrap values (100 iterations) larger than 50 are shown on the branching node of the tree. Scale bar, 0.2 amino acid substitutions per site.

TABLE 2.

Homologous genesa shared between Dishui Lake virophages and large/giant viruses

Virophage ORF (no. of amino acids) Homology with giant protozoa virus (NCBI protein accession no.) (alignment length/identity/E value) Homology with large/giant algae virus (NCBI protein accession no.) (alignment length/identity/E value) Homology with DSLPVs/DSLLAV1-ORF no. (alignment length/identity/E value)
DSLV1
    ORF01 (261) Pithovirus LCPAC001 hypothetical protein (QBK89714.1) (203/28%/3e-14)
    ORF02 (288) Bathycoccus sp. RCC1105 virus hypothetical protein-( ADQ91335.1) (202/27%/5e-18) DSLPV4-ORF64 (181/30%/2e-20)
    ORF25 (173) Paramecium bursaria Chlorella virus NY2A hypothetical protein ( YP_001497873.1) (118/45%/5e-36)
DSLV2
    ORF01 (260) Pithovirus LCPAC001 hypothetical protein (QBK89714.1) (198/32%/1e-17)
    ORF06 (196) Chrysochromulina ericina virus hypothetical protein ( YP_009173402.1) (173/34%/2e-22)
    ORF09 (218) -
    ORF10 (306) Heterosigma akashiwo virus 01 hypothetical protein (AOM63366.1) (210/28%/1e-15)
    ORF37 (107) Marseillevirus Shanghai 1 hypothetical protein MarSH_451 ( AVR53156.1) (82/41%/4e-13)
    ORF38 (343) Marseillevirus LCMAC202 nucleotidyl transferase ( QBK88130.1) (342/26%/5e-13) DSLLAV1-ORF214 (338/21%/5e-8)
DSLV3
    ORF01 (264) Pithovirus LCPAC001 hypothetical protein (QBK89714.1) (224/30%/1e-19)
    ORF17 (328) DSLLAV1-ORF315 (225/23%/3e-6)
    ORF22 (220) Mimivirus AB-566-O17 hypothetical protein (ARR74930.1) (172/22%/3e-7) Phaeocystis globosa virus hypothetical protein ( YP_008052689.1) (158/25%/1e-13) DSLPV4-ORF64 (219/23%/3e-17)
    ORF24 (271) Pithovirus sibericum DNA-cytosine methyltransferase ( YP_009001166.1) (250/34%/1e-36) Aureococcus anophagefferens virus cytosine-5 methyltransferase- ( YP_009052204.1) (248/32%/1e-34)
DSLV4
    ORF01 (268) Pithovirus LCPAC001 hypothetical protein (QBK89714.1) (210/27%/2e-13)
    ORF10 (130) Paramecium bursaria Chlorella virus NY2A hypothetical protein (YP_001497873.1) (130/44%/5e-37)
    ORF20 (131) Heterosigma akashiwo virus 01 hypothetical protein ( AOM63366.1) (136/30%/2e-8) DSLLAV1-ORF315 (117/27%/3e-5)
    ORF26 (206) Micromonas pusilla virus SP1 hypothetical protein ( AET84892.1) (189/26%/1e-16) DSLPV4-ORF64 (196/26%/2e-20)
DSLV5
    ORF01 (253) Pithovirus LCPAC001 hypothetical protein ( QBK89714.1) (196/31%/4e-18)
    ORF02 (244) Mimivirus AB-566-O17 hypothetical protein ( ARR74930.1) (170/26%/4e-14) DSLPV4-ORF64 (182/26 %/1e-15)
    ORF08 (312) Heterosigma akashiwo virus 01 hypothetical protein (AOM63366.1) (189/25%/1e-10)
    ORF10 (849) Aureococcus anophagefferens virus putative D5-ATPase-helicase ( YP_009052397.1) (534/24%/1e-28)
DSLV6
    ORF01 (270) Pithovirus LCPAC001 hypothetical protein (QBK89714.1) (218/25%/7e-12)
    ORF15 (488) Paramecium bursaria Chlorella virus glycoprotein repeat domain-containing protein ( AGE59082.1) (420/51%/3e-89) DSLPV4-ORF180 (376/45%/1e-80)
    ORF17 (324) Heterosigma akashiwo virus 01 hypothetical protein (AOM63366.1) (218/26%/3e-14)
    ORF24 (210) Bathycoccus sp. RCC1105 virus hypothetical protein ( YP_004061590.1) (220/25%/2e-15) DSLPV4-ORF64 (194/26%/2e-19)
DSLV7
    ORF01 (260) Pithovirus LCPAC001 hypothetical protein (QBK89714.1) (203/28%/4e-14)
    ORF05 (173) Paramecium bursaria Chlorella virus NY2A hypothetical protein (YP_001497873.1) (118/45%/ 5e-36)
    ORF30 (302) Bathycoccus sp. RCC1105 virus hypothetical protein (ADQ91335.1) (202/27%/9e-18) DSLPV3-ORF157 (201/27%/6e-12)
DSLV8
    ORF01 (253) Pithovirus LCPAC001 hypothetical protein (QBK89714.1) (196/31%/4e-18)
    ORF02 (244) Mimivirus hypothetical protein (ARR74930.1) (170/26 %/4e-14) Bathycoccus sp. RCC1105 virus hypothetical protein (ADQ91335.1) (211/25%/5e-14) DSLPV4-ORF64 (182/26%/1e-15)
    ORF08 (312) Heterosigma akashiwo virus 01 hypothetical protein (AOM63366.1) (189/25%/1e-10)
    ORF10 (849) Aureococcus anophagefferens virus D5-ATPase-helicase (YP_009052397.1) (534/24%/1e-28)

Interestingly, similar evidence was observed for four known CVv systems; virophages of the Sputnik, Zamilon, Mavirus, and Rio Negro virophages (RNV) shared more genes (4, 1, 3, and 5, respectively) with closer homology to those of giant protozoan viruses (amino acid identity, 27% to 76%) but shared only two distant homologous genes with large/giant alga viruses after removing redundancy (amino acid identity, 25% to 28%) (Fig. 7B). Notably, the closest gene for Sputnik and Zamilon is from their giant host viruses mamavirus and Mimivirus (Table 3), which is in agreement with the phylogenetic analysis. On the tree, Sputnik and Zamilon are grouped together along with only their giant protozoan virus host (Fig. 7D). Accordingly, it is conceivable to speculate that virophages and their giant virus hosts tend to share genes due to the specific coexistance relationship (parasite and host), which is likely one of the typical genetic characteristics of CVv systems.

TABLE 3.

Homologous genesa shared between four known virophages and large/giant viruses

Virophage ORF (no. of amino acids) Homology with protozoan giant virus (NCBI protein accession no.) (alignment length/identity/E value) Homology with large/giant algae virus-Accession no. (Alignment length/Identity/E value)
Sputnik
ORF06 (310) Mamavirus collagen triple helix repeat protein-( AEQ60378.1) (251/76%/9e-107) Paramecium bursaria Chlorella virus CviKI hypothetical protein ( AGE51678.1) (140/28%/3e-6)
ORF12 (152) Mamavirus hypothetical protein ( AEQ60743.1) (122/70%/8e-58)
ORF13 (779) Mimivirus DNA replication (AVG46848.1) (454/58%/0) Heterosigma akashiwo virus 01 D5-ATPase-helicase ( AOM63458.1) (354/25%/1e-11)
ORF14 (114) Moumouvirus Monve hypothetical protein tv_L8 (AEY99266.1) (58/60%/1e-17)
ORF16 (130) Moumouvirus hypothetical protein (AQN67875.1) (83/49%/1e-16)
ORF17 (88) Moumouvirus australiensis transcriptional regulator ( AVL93333.1) (51/43%/1e-7)
Mavirus
ORF20 (152) Cafeteria roenbergensis virus BV-PW1 hypothetical protein crov514 ( YP_003970147.1) (112/49%/2e-25)
Zamilon
ORF08 (81) Moumouvirus Monve hypothetical protein tv_L8-( AEY99266.1) (53/72%/3e-21)
ORF09 (778) Mimivirus DNA replication ( AVG46848.1) (456/73%/0) Heterosigma akashiwo virus 01 D5-ATPase-helicase ( AOM63458.1) (340/24%/4e-14)
ORF15 (305) Mimivirus collagen triple helix repeat-containing protein (AKI78955.1) (192/72%/5e-82)
ORF19 (147) Mimivirus low complexity ( AVG46387.1) (129/50%/3e-37)
RNV
ORF03 (245) Pithovirus hypothetical protein ( QBK89714.1) (226/27%/1e-18)
ORF04 (139) Moumouvirus zinc finger C2H2-type domain-containing protein ( AEY99263.1) (73/37%/8e-12)
ORF06 (310) Mamavirus collagen triple helix repeat containing protein (AEQ60378.1) (251/76%/1e-107)
ORF08 (184) Pithovirus hypothetical protein ( QBK89712.1) (112/32%/1e-7)
ORF12 (152) Mamavirus hypothetical protein (AEQ60743.1) (122/70%/5e-59)
ORF13 (779) Mimivirus DNA replication (AVG46848.1) (454/58%/0)
ORF14 (114) Moumouviru hypothetical protein (AEY99266.1) (58/60%/3e-18)
ORF16 (130) Moumouvirus hypothetical protein AQN67875.1) (83/49%/4e-17)
ORF20 (442) Marseillevirus nucleotidyl transferase ( QBK88130.1) (354/28%/8e-36)

Taken together, the Dishui Lake virophages likely parasitized distinct large/giant virus hosts as well as eukaryotic hosts, in contrast to the known CVv systems, and large/giant green algal viruses, e.g., DSLPV4/DSLLAV1 and their close relatives, may serve as the potential virus hosts for DSLVs.

Codon usage analysis of DSLVs and large/giant alga viruses.

To further support the potential specific relationship between DSLVs and large/giant alga viruses, codon usage analysis was performed in order to unveil the genomic signature (GS) that is associated with a total net response to selective pressure for Dishui Lake virophages, Dishui Lake large alga viruses, known large/giant alga viruses, and known giant protozoan viruses, with known virophages and their giant protozoan viruses as the positive control.

As shown in Fig. 8A, eight DSLVs were separated into two groups. DSLV1 to 4, 6, and 7 resembled mostly the codon usage profiles of the micro-green alga-infecting PBCV-1 (Chlorovirus) and TetV-1 (Mimiviridae group III). They also shared highly similar codon usage profiles with micro-green alga-infecting prasinoviruses, DSLPV4, and DSLLAV1. Meanwhile, high correlation values (>0.8) of codon usage frequency were detected among these genomes (Table 4). In contrast, the codon usage frequency of DSLV5 and 8 was overwhelmingly similar to that of both giant protozoan viruses (Mimiviridae groups I and II) and large alga viruses (Mimiviridae group III), which is consistent with the higher correlation values of codon usage frequency among them (Table 4). As expected, the Sputnik/Zamilon-Mimivirus host and the Mavirus-CroV host were grouped together based on their similar codon usages, which indicates that codon usage analysis is relatively reliable to make inferences about the virus hosts of DSLVs. Hence, these results suggest again that large/giant alga viruses, especially micro-green alga viruses, e.g., DSLPV4 and DSLLAV1, are the candidate virus hosts of DSLVs, which is in line with the results obtained based on dinucleotide relative frequencies and correlation analyses as shown below.

FIG 8.

FIG 8

Genomic signature analysis of virophages and large/giant viruses. (A) Codon usage profiles on the genomic landscape. Rows are the codon usage frequency of each given genome, and columns represent codons. (B) The line chart of DiRF on the genomic landscape. Dinucleotide distribution values are ranked in descending order, and the similar DiRF profiles shared between DSLVs and large green alga viruses are shaded in gray. (C) The area chart of DiRF on the genomic landscape. The DiRF of each given genome is displayed in different colors, and the relative frequency of the same dinucleotide is shown in same colors for different genomes. (D) The heatmap of correlation of DiRF. Columns represent virophages, and rows denote large/giant viruses. PgVV, Phaeocystis globosa virus virophage; Mch, Megavirus chiliensis; Mc11, Megavirus courdo 11; ACMV, Acanthamoeba castellani mamavirus; DSLLAV1, Dishui Lake large alga virus 1; CroV, Cafeteria roenbergensis virus. The full names of DSLV, RNV, DSLPV, YSLPV, OtV, AaV, CeV, OLPV, MpV, PBCV, and TetV are shown in Fig. 2 and 4.

TABLE 4.

Pearson correlation value of codon usage frequency between genomes

Virus DSLV5 DSLV8 PgVV Sputnik Mavirus Zamilon RNV Guarani
CeV 0.966 0.965 0.902 0.959 0.965 0.958 0.961 0.964
AaV 0.971 0.966 0.931 0.969 0.957 0.969 0.970 0.971
PgV_16T 0.959 0.958 0.973 0.914 0.952 0.938 0.915 0.918
OLPV2 0.921 0.916 0.929 0.873 0.915 0.915 0.871 0.873
OLPV1 0.941 0.937 0.909 0.893 0.941 0.904 0.890 0.892
CroV 0.964 0.963 0.889 0.972 0.971 0.965 0.974 0.976
Mch 0.967 0.964 0.898 0.971 0.966 0.966 0.972 0.975
Mc11 0.967 0.964 0.900 0.972 0.966 0.969 0.973 0.976
Mimivirus 0.973 0.970 0.905 0.982 0.956 0.973 0.982 0.983
Moumouvirus 0.974 0.972 0.895 0.974 0.973 0.970 0.976 0.978

Dinucleotide relative frequencies (DiRF) and correlation analyses.

Genomic signatures of DSLVs and large/giant algae viruses were also examined based on oligonucleotide relative frequency (OnRF) and corresponding Pearson’s correlation analyses (27). DSLVs showed strikingly similar DiRF profiles to large green alga viruses, such as PBCV-1, TetV-1, DSLPV4, and DSLLAV1, but clearly different from giant protozoan viruses of Mamavirus and CroV (Fig. 8B and C). Notably, the DiRF of AA, TT, AT, TA, GG, CC, CG, and GC in DSLV5 and DSLV8 was in between that of large green alga viruses and giant protozoan viruses (Fig. 8B). Moreover, the DiRF results are consistent with the correlation values displayed in the heatmap shown in Fig. 8D (Table 5). Both trinucleotide and tetranucleotide relative frequency patterns are similar to those of DiRF but with decay correlation values (data not shown). Therefore, further analysis was not performed.

TABLE 5.

Pearson correlation value of DiRF between genomes

Virus DSLV2 DSLV3 DSLV1 DSLV7 DSLV4 DSLV6 DSLV5 DSLV8 Sputnik Mavirus
PBCV1 0.877 0.854 0.932 0.936 0.945 0.968 0.947 0.948 0.926 0.923
TetV-1 0.880 0.879 0.915 0.915 0.946 0.970 0.914 0.913 0.908 0.900
DSLLAV1 0.819 0.512 0.791 0.795 0.729 0.772 0.889 0.888 0.885 0.818
DSLPV4 0.876 0.824 0.943 0.955 0.922 0.943 0.959 0.959 0.926 0.910
CroV 0.921 0.712 0.883 0.880 0.908 0.932 0.983 0.984 0.983 0.987
ACMV 0.909 0.764 0.912 0.912 0.932 0.959 0.990 0.990 0.982 0.967

A CRISPR-Cas-like system in DSLLAV1.

As shown in Fig. 9A, a 20-nucleotide-long fragment from both the DSLV5 ORF24 and DSLV8 ORF24 was detected in the DSLLAV1 ORF142 with one mismatch (DSLV5 ORF24) and no mismatch (DSLV8 ORF24), respectively. Meanwhile, a pair of repeated sequences of 47 bp in length was identified in the upstream sequences of the 20-nucleotide-long fragment separated by 10 bp and overlapped downstream by 8 bp. Interestingly, the overlapped 8-bp fragment was repeated four and two times in the pair of 47-bp repeats and spacer, respectively. Moreover, a putative fusion protein (DSLLAV1 ORF140) was located upstream of and only separated by one ORF from the ORF142 that contains the 20-nucleotide sequence locus, which contains a DNA-binding helix-turn-helix (HNH) domain, a putative DNA-binding motif (NUMOD4 motif), and two HNH endonuclease domains that are the typical characteristics of the Cas9 proteins. Additionally, a putative endonuclease gene (DSLLAV1 ORF136) was identified upstream of ORF140 and ORF142. Structural prediction and comparison analyses indicate that DSLLAV1 ORF136 and ORF140 share evident similarity with CAS4_PYRCJ (NCBI protein accession number A3MTK6) (33.3% identity, 59% coverage) and CAS9_STAAU (NCBI protein accession number J7RUA5) (30.6% identity, 83% coverage), respectively (Fig. 10A and B). In addition, they were phylogenetically related to cas4 and cas9 proteins, respectively (Fig. 10C and D). Taken together, the DSLLAV1 seems to possess a novel CRISPR-Cas-like system, named the large alga virus virophage resistance element (LAVVIRE), which links to resist the parasitization of the DSLV5 and 8.

FIG 9.

FIG 9

CRISPR-Cas-like system in DSLLAV1. (A) The organization and domain architectures of the DSLLAVVIRE system in DSLLAV1. The 20-bp-long segment that is shared between DSLV8 and DSLLAV1 is indicated in gray. The pair of 47-bp-long repeated sequences (R1) is underlined, and the 8-bp-long repeat (R2) is marked with horizontal brackets. (B). Schematic presentation of the DSLLAVVIRE system in DSLLAV1, the MIMIVIRE system in APMV-A, and the prokaryotic CRISPR-Cas system.

FIG 10.

FIG 10

(A and B) Structural analysis and (C and D) analysis phylogenetic of Cas-like proteins in DSLLAV1. Structural similarity is shared between (A) ORF136 and CAS4_PYRCJ and between (B) ORF140 and CAS9_STAAU. The unrooted maximum likelihood phylogenetic trees are constructed for (C) ORF136 and (D) ORF140, respectively. Bootstrap values (100 iterations) larger than 50 are shown on the branching node of the tree. Scale bar, 0.5 amino acid substitutions per site.

Notably, several short segments (20 bp in length) from DSLV2, 4, and 8 matched those of DSLPV4 and PBCV1 with no mismatch, while both repeated sequences and Cas-associated proteins failed to be identified in the vicinity of these loci. Additionally, CRISPR-Cas-like systems against DSLVs were not detected in other large green alga viruses discovered thus far.

DISCUSSION

Novel and diverse virophages discovered in Dishui Lake. Virophages are small dsDNA viruses with circular or linear genomes of 13 to 30 kbp in length and encode 16 to 34 genes, a very small set of which is shared with large/giant viruses (28).

Since its discovery, the unique lifestyle of CVv has added to our knowledge of viruses as well as their evolution (1215, 17, 22, 2931). Importantly, novel virophages have been continually found in coculture systems and environmental metagenomes, mainly from freshwater lakes, during the last decade, which suggests that their diversity and function in nature remain largely unknown (1215, 1720, 22, 25, 2935).

In our previous study, virophages were detected as present in Dishui Lake for a whole year, indicating that they are unarguably indigenous residents in the lake. Meanwhile, the complete genomes of a novel virophage, DSLV1 (25), and a large alga virus, DSLPV1 (24), were obtained. Here, to give insights into the intriguing CVv relationships in DSL, the DSL metagenomic data sets were subjected to comprehensive analysis, which resulted in the discovery of the complete genomes of seven novel virophages along with four novel large green alga viruses.

DSLVs possess relatively long genomes (26 to 32 kbp) in comparison to those of all known virophages discovered thus far (13 to 30 kbp), especially those of virophages, e.g., Sputnik (12), Mavirus (15), Zamilon (17), Guarani (30), and RNV (29), parasitizing giant protozoan viruses of the Mimiviridae family (less than 20 kbp). In contrast, interestingly, virophages of YSLV3 (19) and Mendota 1002791 (44), discovered in the Yellowstone Lake and Lake Mendota metagenomic data sets, respectively, and speculated to be associated with large/giant alga viruses instead of giant protozoan viruses, have genome sizes similar to those of DSLVs. Coincidently, both YSLV3 and Mendota 1002791 were phylogenetically grouped into the clade mainly containing DSLVs, with the exception of only DSLV2 and 3.

The GC content of DSLV genomes is 32.2 to 45.2%, which is more similar to that of polinton-like virophages (PLVs) (30 to 39%) of Chrysochromulina parva virus, which infects the freshwater haptophyte algae Chrysochromulina parva (31), and to that of the virophage-like element PgVV (36%) associated with the marine bloom-forming haptophyte alga Phaeocystis globosa-infecting Phaeocystis globosa virus 16T (22). Notably, the GC content of all known virophages involved in the protozoan CVv system is smaller (less than 30%), which is clearly different from that of DSLVs.

Moreover, on the phylogenetic tree (Fig. 2B), DSLVs are closely related to the virophages that were proposed to be associated with large/giant alga viruses but distantly related to the virophages involved in the protozoan CVv systems, such as Sputnik, Zamilon, RNV, Guarani, and Mavirus. The homolog counterparts shared between DSLVs and other known virophages support this conclusion as well.

Interestingly, DSLVs were also detected in the neighboring freshwater environments of DSL, e.g., Dazhi River (inflow of Dishui Lake), Dianshan Lake, and Yangtze River Estuary, but not in Shanghai coastal seawater (data not shown). This suggests that DSLVs particularly adapt to the protist host in the freshwater ecotopes. Consistently, micro-green algae (Chlorophyta) are highly dominant in DSL based on metagenomic analysis in comparison to the other protists, especially the protozoa, and large/giant alga viruses are overwhelmingly more dominant than giant protozoan viruses (data not shown).

Taken together, our results suggest that DSLVs are very likely involved in green alga-hosted CVv tripartite systems.

Novel CVv system. All isolated virophages based on coculture (Sputnik, Mavirus, Zamilon, and RNV) require a protozoan Mimiviridae virus infection to replicate and produce infectious virions, and they possess at least one gene with a close homology to that of their native host viruses (Fig. 7B). For example, Sputnik has the most genes (three) with closer homology (identity, >40%; E value, <10−20) to that of giant protozoan mamavirus but no genes with homology to any of the large/giant alga viruses (Table 3).

Similar evidence was found for the DSLVs as well. Four DSLVs have five genes with closer homology (identity, >40%; E value, <10−20) to that of the large alga viruses but none with homology to that of either the known giant protozoan viruses (Fig. 7A) or the giant protozoan virus-related viral sequences identified from DSL metagenomic data sets (data not shown). Moreover, these four DSLVs share the closer homologous genes only with green alga (chlorophyte)-infecting large viruses, e.g., Paramecium bursaria Chlorella virus (Phycodnaviridae) and DSLPV4 (Table 2).

Meanwhile, DSLV6 also shares one close homolog counterpart with DSLPV4 (identity, 45%; E value, 10−80). In addition, in total, there are five nonredundant homologous genes (identity, 21% to 45%; E value, 10−5 to 10−80) shared between DSLVs and Dishui Lake large alga viruses of DSLPV3, DSLPV4, and DSLLAV1. In contrast, unambiguous homologous genes were not identified between DSLVs and DSLPV1 and 2, and only one homologous gene was shared between DSLVs and DSLPV3 prior to removing redundancy (Table 2).

These results suggest that the potential virus hosts of DSLVs conservatively belong to large/giant green alga viruses, such as Paramecium bursaria Chlorella virus, DSLPV4, and DSLLAV1, but not DSLPV1 to 3.

To shed more light on the corresponding association between DSLVs and their potential virus hosts, both relative codon usage (RCU) and oligonucleotide relative frequency (OnRF) analyses were performed to uncover the shared common genomic patterns resulting from coevolution, such as a total net response of both virophages and their virus hosts to selective pressure from their eukaryote hosts (27). These results suggest again that DSLVs are likely involved in CVv systems having large/giant green alga viruses as virus hosts.

A CRISPR-Cas-like system in DSLLAV1. The CRISPR-Cas system is a bona fide adaptive (acquired) immunity system with immune memory that is stored in the form of spacer sequences derived from foreign genomes and inserted into CRISPR arrays and has been found in about 48% of bacteria and 80% of archaea (36). Strikingly, the MIMIVIRE system, a nonhomologous CRISPR-Cas like system, was identified in giant protozoan viruses (37). It represents the nucleic acid-based defense system, conferring the resistance of lineage A mimiviruses to Zamilon virophage infection. Thus far, the MIMIVIRE system is the sole antivirophage parasitizing system found in giant viruses. In this study, surprisingly, DSLLAVVIRE, a putative nonhomologous CRISPR-Cas-like system, is detected in the Mimiviridae-related large alga virus DSLLAV1, which likely plays a critical role in the nucleic acid-based defense system involving resistance to the parasitization of the virophages DSLV5 and 8. One of the apparent differentiations regarding the anti-foreign nucleic acid invading systems of MIMIVIRE, DSLLAVVIRE, and CRISPR is associated with the structural organization of the repeated sequences (Fig. 9B). The DSLLAVVIRE system is characterized by a chimeric array of four short repeated sequences that distribute to two long repeated sequences. Like the MIMIVIRE system but unlike the CRISPR system, the short repeated sequences are shared between virophages and large alga viruses; unlike the MIMIVIRE system but like the CRISPR system, the long repeated sequences only belong to large alga viruses. Whether such a unique array of repeated sequences may enhance the sequence-specific recognition of the invading virophages by large alga viruses or represents an ancient nucleic acid-based defense system needs further in-depth investigation. Interestingly, the CRISPR-Cas9 family proteins only contain one HNH endonuclease domain. In contrast, two HNH endonuclease domains were identified in the cas9-like protein of DSLLAV1 ORF140. It may promote the cleavage activity of the cas9-like protein and consequently strengthen the host defense against virophage parasites. Further experimental work needs to be done in order to verify these speculations.

In conclusion, both diverse virophages and large green alga viruses were discovered in the freshwater lake DSL, Shanghai, China. Their close relatives were from Yellowstone Lake (19, 20) and Lake Mendota (32), USA, although these two lakes are located in the Western Hemisphere, and the linear distance between them is about 1,690 kilometers. It would be very interesting to figure out how the freshwater virophage relatives distribute globally since we have not detected any of them in seawater containing the outflow of DSL. The large/giant green alga viruses, either belonging to phycodnavirus or related to Mimivirus-related alga viruses, appear to be involved in the CVv tripartite systems in DSL. Isolation of virophages and large/giant green alga viruses from DSL is being carried out in our lab based on coculture methods, and the preliminary data do support the evidence of the CVv tripartite infection systems presented here. Strikingly, DSLLAV1 possesses a special CRISPR-Cas-like system, which likely protects DSLLAV1 from infection with DSLVs. The finding about DSLLAVVIRE strengthens the presence of DSL CVv and opens the door to exploring the complicated multi-interaction mechanisms of CVv.

MATERIALS AND METHODS

Environmental metagenomic data sets.

Surface water samples (depth, <1 m; 500 to 1,000 ml) from Dishui Lake (121°55′27.00ʺN, 30°53′56.00ʺE) were collected once a month for a whole year from October 2013 to September 2014 (the water samples were collected from DSL without permission since the lake is freely available to citizens), and microbial biomass was collected onto 0.22-μm membrane filters as described previously (25). Two metagenomic libraries that contained 430-bp paired-end (PE) and 3 kb meta-pair (MP) inserts, respectively, were generated on a MiSeq sequencer (Illumina) with the standard MiSeq protocols (Shanghai Personal Biotechnology). The total sequence outputs were 41.81 Gbp and 7.87 Gbp for the PE and MP libraries, respectively. Quality assessment and control of raw data were performed as previously described (25). General information of the two libraries is shown in Table 6.

TABLE 6.

Summary of the DSL metagenomic data setsa

Type Run Raw data (no. of reads) QC by the pipeline (no. of reads) QC by NGS (no. of reads)
PE 1 35,371,138 23,505,862 23,151,200
PE 2 20,766,464 17,580,296 17,507,722
PE 3 7,974,858 6,239,280 6,202,911
PE 4 17,264,916 14,037,596 13,964,065
MP 1 15,136,922 12,272,954 12,216,490
MP 2 1,833,418 1,407,000 1,387,258

Metagenomic sequence assembly.

A total of 60,825,898 clean reads from the PE library were assembled into contigs using Newbler v2.6 (Roche) with default parameters. Assembly metrics were assessed with QUAST v2.3 (38). Virophage genome assembly was done as follows: first, major capsid proteins (MCPs), conserved in all known virophages, were used as query sequences to search the DSL metagenomic assemblies (tBLASTx; E value, <10−5); matched contigs were then subjected to sequence assembly using procedures similar to that described previously (25), with some modifications. Briefly, each virophage-related contig served as a reference sequence, and then reference assembly was performed by using trimmed reads of the Dishui Lake metagenomic data sets with a minimum overlap length of 25 bp and minimum overlap identity of 90%. The reference assembly was repeated until the assembled sequence stopped extending.

Large alga virus genome assembly was done as follows: contigs with a length of more than 10 kb were selected and subjected to ORF prediction using MetaGeneMark software (39); the translated amino acid sequences of all predicted ORFs were then searched (BLASTp; E value, <10−5) against the NCBI nonredundant (nr) database; once the contigs were initially confirmed to be related to large alga viruses according to clustering analysis of annotated ORFs with the MEGAN software (40), they were considered the templates in the reference assembly as described above for the virophages. All assemblies were performed with the bioinformatics software Geneious prime (Biomatters).

Assembly check. The integrity and accuracy of the assembled viral genomes were double-checked with a combination of mate pair (MP) library mapping and PCR amplification with the specific primers designed based on the genomic sequences. First, all metagenomic assemblies and clean reads of the MP library of DSL metagenomics were mapped to the assembled viral genomic sequences, and then the ambiguous or low-coverage regions of genomes, if there were any, as well as the circular nature of the genomes, were determined by PCR amplification, cloning, and sequencing. PCRs (25 μl) contained 1 mM forward and reverse primers (Table 7), 12.5 μl Taq PCR master mix 2× (Sangon Biotech), and 10 ng of DNA samples. The PCR thermal cycling conditions were as follows: initial denaturation at 94°C for 4 min, followed by 30 cycles of 94°C for 30 s, 52 to 59°C (depending on the primers used [Table 7]) for 30 s, and 72°C for 1 min. PCR amplicons were checked using 0.7% agarose gel electrophoresis.

TABLE 7.

Primer sets for verification of virophage genomes

Virophage Primer set Target length (bp) Primer sequence (5′–3′)
Forward Reverse
DSLV2 1 1,015 GAGACGGGTCGCCACTTTAA TCAAGTGGGCGAAGTCTGTC
DSLV2 2 1,196 ATTTGCGAGTGGAGGGAAGG TCTTGTTCGTCGTTCGTCGT
DSLV2 3 759 GACGGCTTCAATAACTTCTG GGCAGACCAAAATTAGAGTG
DSLV2 4 1,053 CCTACGGGTTAAGTTGAAAT AATGAAAAATCCGCACCATC
DSLV3 1 928 CGATGCGTAGCAGACAGGTT CGCTCCGCACGAGTTAATTG
DSLV3 2 649 TGAATTAGCACAAACTATCA AGATTATGGATTTTCGAGAG
DSLV3 3 722 CATCGTAAAGTTGCTTAAAA TTAAAATAAACAAGGGGTCA
DSLV4 1 931 GTATTAATCGTAAAACCAGC TACTTGGAAAATACTTGGAG
DSLV4 2 987 CAATTACGATGGAATACAAC TACTTGGAAAATACTTGGAG
DSLV5 1 621 ATGACCTGCCATTTTAATAT GGTTGTTTTTGGTATGAAAT
DSLV5 2 854 TCTAAAATTGTGCTGTATGA GAAGTCTTATCATTCGGTAA
DSLV5 3 535 TAGAAGAATGGCAGAGTATA TGATCCATAGACATATCCTT
DSLV6 1 565 TTCAAGCAGCAACCGCTG GTGTCTGTGATACGCTGATG
DSLV6 2 1043 AGTCGCATCATCTACGACCT TGTAGATGACGGAGAGTTCG
DSLV6 3 933 GTAGGTTATGGAGGGACGTA TTGTGAGGTCTCTCGTCTTC
DSLV7 1 915 CGTTAGTGGTGTATTCTATT CGTCTTCCATATATTCTCTC
DSLV7 2 906 ACAAAAATCTCAAAAAGTCC GGTTCGGCATTTAATAATAC
DSLV7 3 823 GAAGCAACATCATCAATATC TATCTCCTGACTGAATATGT
DSLV8 1 852 TAAAATTGTGCTGTATGAGA GAAGTCTTATCATTCGGTAA
DSLV8 2 739 GAATATGAGTGAAATTGACG GATCTACCCCACTAAAATTT
DSLV8 3 871 ATGACCTGCCATTTTAATAT TCCATTACCACTTAAAAGAG

Viral genome analysis.

Open reading frames (ORFs) were predicted by defining a start codon of ATG and a minimum 150 bp with Geneious Prime software. They were annotated by BLASTp search (E value, <10−5) against the NCBI nr database, the InterProScan v5 program (41), and an NCBI conserved domain searching program (42). tRNA sequences were identified using tRNAscan-SE (http://lowelab.ucsc.edu/tRNAscan-SE/). Repetitive sequences were detected with both the Geneious Prime software and the REPuter program (http://bibiserv.techfak.uni-bielefeld.de/reputer/).

Phylogenetic analysis. For virophages, phylogenetic trees were constructed based on a concatenated alignment of four virophage core genes of major capsid protein, minor capsid protein, DNA packaging ATPase, and cysteine protease. Reference virophage sequences (7 isolates and 27 complete or partial genomes) were downloaded from NCBI, EMBL, or CyVerse. Homologous amino acid sequences were aligned using MUSCLE (43), and amino acid positions in multiple alignments that contain >30% gaps or are noninformative were removed manually. The filtered alignments were used for tree reconstruction with FastTree v2.1.7 (WAG model, gamma parameter estimated) (44).

As for large/giant viruses, concatenated alignment of four conserved protein-encoding genes of DNA polymerase B family, A18 helicase, ATPase (DNA packaging), and serine/threonine protein kinase were used to reconstruct phylogenetic trees. Multiple amino acid sequences were aligned using the MUSCLE program (43), and maximum-likelihood trees were constructed with MEGA X software (45) with the Le-Gascuel (LG) model and a bootstrap value of 100.

Genetic links between DSL virophages and other virophages.

The relationships of DSLVs to other virophages were also explored on the genomic landscape. Homologous genes shared between DSLVs and the other known virophages were determined using amino acid sequences of DSL virophages as queries to search (local BLASTp; E value, <10−5) the local database that contained all protein sequences of the other virophages. Homolog counterparts shared among them were plotted with a heatmap using pheatmap (R package).

Genetic links between virophages and large/giant viruses.

To explore the number and affinity of shared genes between virophages and large/giant viruses of algae and protozoa, a local database was constructed, which contained all protein sequences, downloaded from the NCBI nr databases, of large/giant protist viruses. Homologous genes shared between virophages (four known virophages [serving as a positive control] of Sputnik, Mavirus, Zamilon, and Rio Negro virophage [RNV] as well as DSLVs) and large/giant protist viruses were determined using protein-encoding sequences of virophages as queries to search the local database (local BLASTp; E value, <10−5). Subsequently, identified homologous genes shared between virophages and large/giant protist viruses were used as queries to perform online BLASTp search (E value; <10−5), and top hits that were different from local BLAST results were downloaded and included to reconstruct phylogenetic trees.

To figure out the number and affinity of shared genes between virophages and large/giant viruses in Dishui Lake, all predicted proteins of Dishui Lake virophages were searched (BLASTp; E value, <10−5) against a local database comprising all predicted proteins from the assembled genomic sequences of DSL large viruses. One top hit was recorded.

Codon usage analysis of virophages and large/giant viruses.

All coding sequences from each viral genome were extracted and subjected to analysis for genomic landscape codon usage frequency (https://www.bioinformatics.org/sms2/index.html), and a heatmap was constructed using the obtained frequency data with pheatmap (R package).

Oligonucleotide frequencies and correlation analyses.

The usage and relative frequency of each oligonucleotide, including dinucleotides, trinucleotides, and tetranucleotides, were analyzed with a Perl script. Pearson’s correlation of oligonucleotide frequency was calculated between the frequencies of each viral genome’s dinucleotides, trinucleotides, and tetranucleotides using SigmaPlot v14.0 software with default parameters. Significant correlation values shared between virophages and large/giant viruses were plotted with a heatmap using pheatmap (R package).

Identification of a CRISPR-like system in DSLPVs/DSLLAV1.

The genomes of all eight Dishui Lake virophages were fragmented into short segments of 15, 20, 25, 30, 35, and 40 kmers using an algorithm written in Perl. All segments were mapped to the DSLPVs and DSLLAV1 genomes, including the other known large alga viruses with no mismatch or one mismatch. Large alga viral genomic sequences that matched virophage sequences were subjected to further analysis for repeated sequences and Cas proteins. Putative Cas proteins were first identified based on different gene annotation tools, e.g., InterProScan (41), eggNOG-mapper (46), Batch CD-Search (42), and HHpred (47), and then confirmed through structural comparison to related known Cas proteins (SWISS-MODEL [48]). Maximum likelihood phylogenetic trees of these proteins were constructed using Fasttree v2.1 software (44) with the aligned protein sequences (Clustal W [49]) obtained from NCBI BLAST top hits and a subset of related Cas family proteins which have been validated experimentally.

Data availability.

The data sets generated and/or analyzed during the current study are available in Genbank under accession numbers MN940570 (DSLV2), MN940572 (DSLV3), MN940571 (DSLV4), MN940574 (DSLV5), MN940573 (DSLV6), MN940576 (DSLV7), MN940575 (DSLV8), MN940577 (DSLPV2), MN940578 (DSLPV3), MN940579 (DSLPV4), and MN940580 (DSLLAV1). All data generated or analyzed during this study are included in this published article and its supplementary information files.

Supplementary Material

Supplemental file 1

ACKNOWLEDGMENTS

We declare that we have no competing interests.

This work was supported by the National Natural Science Foundation of China (41376135, 31570112, and 41876195).

Y.W. conceived and designed this study. S.X., L.Z., X.L., Y.Z., and H.C. assembled the sequences and analyzed the data. H.C. performed the sampling and experiments. S.Y. contributed the reagents and critically modified the manuscript. Y.W., S.X., and L.Z. wrote the manuscript. Y.W. analyzed the data. All authors read and approved the final manuscript.

Footnotes

Supplemental material is available online only.

REFERENCES

  • 1.La Scola B, Audic S, Robert C, Jungang L, de Lamballerie X, Drancourt M, Birtles R, Claverie J-M, Raoult D. 2003. A giant virus in amoebae. Science 299:2033–2033. doi: 10.1126/science.1081867. [DOI] [PubMed] [Google Scholar]
  • 2.Abergel C, Legendre M, Claverie J-M. 2015. The rapidly expanding universe of giant viruses: Mimivirus, Pandoravirus, Pithovirus and Mollivirus. FEMS Microbiol Rev 39:779–796. doi: 10.1093/femsre/fuv037. [DOI] [PubMed] [Google Scholar]
  • 3.Claverie J-M, Ogata H, Audic S, Abergel C, Suhre K, Fournier P-E. 2006. Mimivirus and the emerging concept of “giant” virus. Virus Res 117:133–144. doi: 10.1016/j.virusres.2006.01.008. [DOI] [PubMed] [Google Scholar]
  • 4.Desjardins C, Eisen JA, Nene V. 2005. New evolutionary frontiers from unusual virus genomes. Genome Biol 6:212. doi: 10.1186/gb-2005-6-3-212. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Suzan-Monti M, La Scola B, Raoult D. 2006. Genomic and evolutionary aspects of Mimivirus. Virus Res 117:145–155. doi: 10.1016/j.virusres.2005.07.011. [DOI] [PubMed] [Google Scholar]
  • 6.Claverie J-M. 2006. Viruses take center stage in cellular evolution. Genome Biol 7:110. doi: 10.1186/gb-2006-7-6-110. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Raoult D, Forterre P. 2008. Redefining viruses: lessons from Mimivirus. Nat Rev Microbiol 6:315–319. doi: 10.1038/nrmicro1858. [DOI] [PubMed] [Google Scholar]
  • 8.Claverie J-M, Grzela R, Lartigue A, Bernadac A, Nitsche S, Vacelet J, Ogata H, Abergel C. 2009. Mimivirus and Mimiviridae: giant viruses with an increasing number of potential hosts, including corals and sponges. J Invertebr Pathol 101:172–180. doi: 10.1016/j.jip.2009.03.011. [DOI] [PubMed] [Google Scholar]
  • 9.Forterre P. 2010. Giant viruses: conflicts in revisiting the virus concept. Intervirology 53:362–378. doi: 10.1159/000312921. [DOI] [PubMed] [Google Scholar]
  • 10.Claverie J-M, Abergel C. 2010. Mimivirus: the emerging paradox of quasi-autonomous viruses. Trends Genet 26:431–437. doi: 10.1016/j.tig.2010.07.003. [DOI] [PubMed] [Google Scholar]
  • 11.Claverie J-M, Abergel C. 2013. Open questions about giant viruses. Adv Virus Res 85:25–56. doi: 10.1016/B978-0-12-408116-1.00002-1. [DOI] [PubMed] [Google Scholar]
  • 12.La Scola B, Desnues C, Pagnier I, Robert C, Barrassi L, Fournous G, Merchat M, Suzan-Monti M, Forterre P, Koonin E, Raoult D. 2008. The virophage as a unique parasite of the giant mimivirus. Nature 455:100–104. doi: 10.1038/nature07218. [DOI] [PubMed] [Google Scholar]
  • 13.Desnues C, La Scola B, Yutin N, Fournous G, Robert C, Azza S, Jardot P, Monteil S, Campocasso A, Koonin EV, Raoult D. 2012. Provirophages and transpovirons as the diverse mobilome of giant viruses. Proc Natl Acad Sci U S A 109:18078–18083. doi: 10.1073/pnas.1208835109. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Gaia M, Pagnier I, Campocasso A, Fournous G, Raoult D, La Scola B. 2013. Broad spectrum of mimiviridae virophage allows its isolation using a mimivirus reporter. PLoS One 8:e61912. doi: 10.1371/journal.pone.0061912. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Fischer MG, Suttle CA. 2011. A virophage at the origin of large DNA transposons. Science 332:231–234. doi: 10.1126/science.1199412. [DOI] [PubMed] [Google Scholar]
  • 16.Fischer MG, Allen MJ, Wilson WH, Suttle CA. 2010. Giant virus with a remarkable complement of genes infects marine zooplankton. Proc Natl Acad Sci U S A 107:19508–19513. doi: 10.1073/pnas.1007615107. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Gaia M, Benamar S, Boughalmi M, Pagnier I, Croce O, Colson P, Raoult D, La Scola B. 2014. Zamilon, a novel virophage with Mimiviridae host specificity. PLoS One 9:e94923. doi: 10.1371/journal.pone.0094923. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Yau S, Lauro FM, DeMaere MZ, Brown MV, Thomas T, Raftery MJ, Andrews-Pfannkoch C, Lewis M, Hoffman JM, Gibson JA, Cavicchioli R. 2011. Virophage control of Antarctic algal host-virus dynamics. Proc Natl Acad Sci U S A 108:6163–6168. doi: 10.1073/pnas.1018221108. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Zhou J, Zhang W, Yan S, Xiao J, Zhang Y, Li B, Pan Y, Wang Y. 2013. Diversity of virophages in metagenomic data sets. J Virol 87:4225–4236. doi: 10.1128/JVI.03398-12. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Zhou J, Sun D, Childers A, McDermott TR, Wang Y, Liles MR. 2015. Three novel virophage genomes discovered from Yellowstone Lake metagenomes. J Virol 89:1278–1285. doi: 10.1128/JVI.03039-14. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Zhang W, Zhou J, Liu T, Yu Y, Pan Y, Yan S, Wang Y. 2015. Four novel algal virus genomes discovered from Yellowstone Lake metagenomes. Sci Rep 5:15131. doi: 10.1038/srep15131. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Santini S, Jeudy S, Bartoli J, Poirot O, Lescot M, Abergel C, Barbe V, Wommack KE, Noordeloos AAM, Brussaard CPD, Claverie J-M. 2013. Genome of Phaeocystis globosa virus PgV-16T highlights the common ancestry of the largest known DNA viruses infecting eukaryotes. Proc Natl Acad Sci U S A 110:10800–10805. doi: 10.1073/pnas.1303251110. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Blanc G, Gallot-Lavallée L, Maumus F. 2015. Provirophages in the Bigelowiella genome bear testimony to past encounters with giant viruses. Proc Natl Acad Sci U S A 112:E5318–E5326. doi: 10.1073/pnas.1506469112. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Chen H, Zhang W, Li X, Pan Y, Yan S, Wang Y. 2018. The genome of a prasinoviruses-related freshwater virus reveals unusual diversity of phycodnaviruses. BMC Genomics 19:49. doi: 10.1186/s12864-018-4432-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Gong C, Zhang W, Zhou X, Wang H, Sun G, Xiao J, Pan Y, Yan S, Wang Y. 2016. Novel virophages discovered in a freshwater lake in China. Front Microbiol 7:5. doi: 10.3389/fmicb.2016.00005. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Wilson W, Van Etten JL, Allen M. 2009. The Phycodnaviridae: the story of how tiny giants rule the world. Curr Top Microbiol Immunol 328:1–42. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Serrano-Solís V, Soares PET, de Farías ST. 2019. Genomic signatures among Acanthamoeba polyphaga entoorganisms unveil evidence of coevolution. J Mol Evol 87:7–15. doi: 10.1007/s00239-018-9877-1. [DOI] [PubMed] [Google Scholar]
  • 28.Mougari S, Sahmi-Bounsiar D, Levasseur A, Colson P, La Scola B. 2019. Virophages of giant viruses: an update at eleven. Viruses 11:733. doi: 10.3390/v11080733. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.Borges IA, de Assis FL, dos Santos Silva SK, Abrahão J. 2018. Rio Negro virophage: sequencing of the near complete genome and transmission electron microscopy of viral factories and particles. Braz J Microbiol 49:260–261. doi: 10.1016/j.bjm.2018.07.003. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Mougari S, Bekliz M, Abrahao J, Di Pinto F, Levasseur A, La Scola B. 2019. Guarani virophage, a new Sputnik-like isolate from a Brazilian lake. Front Microbiol 10:1003. doi: 10.3389/fmicb.2019.01003. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31.Stough JMA, Yutin N, Chaban YV, Moniruzzaman M, Gann ER, Pound HL, Steffen MM, Black JN, Koonin EV, Wilhelm SW, Short SM. 2019. Genome and environmental activity of a Chrysochromulina parva virus and its virophages. Front Microbiol 10:703. doi: 10.3389/fmicb.2019.00703. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32.Roux S, Chan L-K, Egan R, Malmstrom RR, McMahon KD, Sullivan MB. 2017. Ecogenomics of virophages and their giant virus hosts assessed through time series metagenomics. Nat Commun 8:858. doi: 10.1038/s41467-017-01086-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33.Bekliz M, Verneau J, Benamar S, Raoult D, La Scola B, Colson P. 2015. A new Zamilon-like virophage partial genome assembled from a bioreactor metagenome. Front Microbiol 6:1308. doi: 10.3389/fmicb.2015.01308. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34.Yutin N, Kapitonov VV, Koonin EV. 2015. A new family of hybrid virophages from an animal gut metagenome. Biol Direct 10:19. doi: 10.1186/s13062-015-0054-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35.Oh S, Yoo D, Liu W-T. 2016. Metagenomics reveals a novel virophage population in a Tibetan mountain lake. Microbes Environ 31:173–177. doi: 10.1264/jsme2.ME16003. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36.Koonin EV, Makarova KS, Zhang F. 2017. Diversity, classification and evolution of CRISPR-Cas systems. Curr Opin Microbiol 37:67–78. doi: 10.1016/j.mib.2017.05.008. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37.Levasseur A, Bekliz M, Chabrière E, Pontarotti P, La Scola B, Raoult D. 2016. MIMIVIRE is a defence system in mimivirus that confers resistance to virophage. Nature 531:249–252. doi: 10.1038/nature17146. [DOI] [PubMed] [Google Scholar]
  • 38.Gurevich A, Saveliev V, Vyahhi N, Tesler G. 2013. QUAST: quality assessment tool for genome assemblies. Bioinformatics 29:1072–1075. doi: 10.1093/bioinformatics/btt086. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39.Zhu W, Lomsadze A, Borodovsky M. 2010. Ab initio gene identification in metagenomic sequences. Nucleic Acids Res 38:e132. doi: 10.1093/nar/gkq275. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40.Huson DH, Auch AF, Qi J, Schuster SC. 2007. MEGAN analysis of metagenomic data. Genome Res 17:377–386. doi: 10.1101/gr.5969107. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 41.Quevillon E, Silventoinen V, Pillai S, Harte N, Mulder N, Apweiler R, Lopez R. 2005. InterProScan: protein domains identifier. Nucleic Acids Res 33:W116–W120. doi: 10.1093/nar/gki442. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 42.Marchler-Bauer A, Lu S, Anderson JB, Chitsaz F, Derbyshire MK, DeWeese-Scott C, Fong JH, Geer LY, Geer RC, Gonzales NR, Gwadz M, Hurwitz DI, Jackson JD, Ke Z, Lanczycki CJ, Lu F, Marchler GH, Mullokandov M, Omelchenko MV, Robertson CL, Song JS, Thanki N, Yamashita RA, Zhang D, Zhang N, Zheng C, Bryant SH. 2011. CDD: a Conserved Domain Database for the functional annotation of proteins. Nucleic Acids Res 39:D225–D229. doi: 10.1093/nar/gkq1189. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 43.Edgar RC. 2004. MUSCLE: a multiple sequence alignment method with reduced time and space complexity. BMC Bioinformatics 5:113. doi: 10.1186/1471-2105-5-113. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 44.Price MN, Dehal PS, Arkin AP. 2010. FastTree 2: approximately maximum-likelihood trees for large alignments. PLoS One 5:e9490. doi: 10.1371/journal.pone.0009490. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 45.Kumar S, Stecher G, Li M, Knyaz C, Tamura K. 2018. MEGA X: molecular evolutionary genetics analysis across computing platforms. Mol Biol Evol 35:1547–1549. doi: 10.1093/molbev/msy096. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 46.Huerta-Cepas J, Forslund K, Coelho LP, Szklarczyk D, Jensen LJ, von Mering C, Bork P. 2017. Fast genome-wide functional annotation through orthology assignment by eggNOG-mapper. Mol Biol Evol 34:2115–2122. doi: 10.1093/molbev/msx148. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 47.Söding J, Biegert A, Lupas AN. 2005. The HHpred interactive server for protein homology detection and structure prediction. Nucleic Acids Res 33:W244–W248. doi: 10.1093/nar/gki408. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 48.Schwede T, Kopp J, Guex N, Peitsch MC. 2003. SWISS-MODEL: an automated protein homology-modeling server. Nucleic Acids Res 31:3381–3385. doi: 10.1093/nar/gkg520. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 49.Larkin MA, Blackshields G, Brown NP, Chenna R, McGettigan PA, McWilliam H, Valentin F, Wallace IM, Wilm A, Lopez R, Thompson JD, Gibson TJ, Higgins DG. 2007. Clustal W and Clustal X version 2.0. Bioinformatics 23:2947–2948. doi: 10.1093/bioinformatics/btm404. [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplemental file 1

Data Availability Statement

The data sets generated and/or analyzed during the current study are available in Genbank under accession numbers MN940570 (DSLV2), MN940572 (DSLV3), MN940571 (DSLV4), MN940574 (DSLV5), MN940573 (DSLV6), MN940576 (DSLV7), MN940575 (DSLV8), MN940577 (DSLPV2), MN940578 (DSLPV3), MN940579 (DSLPV4), and MN940580 (DSLLAV1). All data generated or analyzed during this study are included in this published article and its supplementary information files.