Plasmodium cynomolgi genome sequences provide insight into plasmodium vivax and the monkey malaria clade

Plasmodium cynomolgi genome sequences provide insight into plasmodium vivax and the monkey malaria clade

Play all audios:

Loading...

ABSTRACT _P. cynomolgi_, a malaria-causing parasite of Asian Old World monkeys, is the sister taxon of _P. vivax_, the most prevalent malaria-causing species in humans outside of Africa.


Because _P. cynomolgi_ shares many phenotypic, biological and genetic characteristics with _P. vivax_, we generated draft genome sequences for three _P. cynomolgi_ strains and performed


genomic analysis comparing them with the _P. vivax_ genome, as well as with the genome of a third previously sequenced simian parasite, _Plasmodium knowlesi_. Here, we show that genomes of


the monkey malaria clade can be characterized by copy-number variants (CNVs) in multigene families involved in evasion of the human immune system and invasion of host erythrocytes. We


identify genome-wide SNPs, microsatellites and CNVs in the _P. cynomolgi_ genome, providing a map of genetic variation that can be used to map parasite traits and study parasite populations.


The sequencing of the _P. cynomolgi_ genome is a critical step in developing a model system for _P. vivax_ research and in counteracting the neglect of _P. vivax_. SIMILAR CONTENT BEING


VIEWED BY OTHERS NEW REFERENCE GENOMES TO DISTINGUISH THE SYMPATRIC MALARIA PARASITES, _PLASMODIUM OVALE CURTISI_ AND _PLASMODIUM OVALE WALLIKERI_ Article Open access 15 February 2024 THE


FIRST COMPLETE GENOME OF THE SIMIAN MALARIA PARASITE _PLASMODIUM BRASILIANUM_ Article Open access 17 November 2022 POPULATION GENETIC ANALYSIS OF _PLASMODIUM KNOWLESI_ REVEALS DIFFERENTIAL


SELECTION AND EXCHANGE EVENTS BETWEEN BORNEO AND PENINSULAR SUB-POPULATIONS Article Open access 07 February 2023 MAIN Human malaria is transmitted by anopheline mosquitoes and is caused by


four species in the genus _Plasmodium_. Of these, _P. vivax_ is the major malaria agent outside of Africa, annually causing 80–100 million cases1. Although _P. vivax_ infection is often


mistakenly regarded as benign and self-limiting, _P. vivax_ treatment and control present challenges distinct from those of the more virulent _Plasmodium falciparum_. Biological traits,


including a dormant (hypnozoite) liver stage responsible for recurrent infections (relapses), early infective sexual stages (gametocytes) and transmission from low parasite densities in the


blood2, coupled with emerging antimalarial drug resistance3, render _P. vivax_ resilient to modern control strategies. Recent evidence indicates that _P. falciparum_ derives from parasites


of great apes in Africa4, whereas _P. vivax_ is more closely related to parasites of Asian Old World monkeys5,6,7, although not itself infective of these monkeys. _P. vivax_ cannot be


cultured _in vitro_, and the small New World monkeys capable of hosting it are rare and do not provide an ideal model system. _P. knowlesi_, an Asian Old World monkey parasite recently


recognized as a zoonosis for humans8, has had its genome sequenced9, but the species is distantly related to _P. vivax_ and is phenotypically dissimilar. In contrast, _P. cynomolgi_, a


simian parasite that can infect humans experimentally10, is the closest living relative (a sister taxon) to _P. vivax_ and possesses most of the same genetic, phenotypic and biological


characteristics—notably, periodic relapses caused by dormant hypnozoites, early infectious gametocyte formation and invasion of Duffy blood group–positive reticulocytes. _P. cynomolgi_ thus


offers a robust model for _P. vivax_ in a readily available laboratory host, the Rhesus monkey, whose genome was recently sequenced11. Here, we report draft genome sequences of three _P.


cynomolgi_ strains and comparative genomic analyses of _P. cynomolgi_, _P. vivax_12 and _P. knowlesi_9, three members of the monkey malaria clade. We sequenced the genome of _P. cynomolgi_


strain B, isolated from a monkey in Malaysia and grown in splenectomized monkeys (Online Methods). A combination of Sanger, Roche 454 and Illumina chemistries was employed to generate a


high-quality reference assembly at 161-fold coverage, consisting of 14 supercontigs (corresponding to the 14 parasite chromosomes) and ∼1,649 unassigned contigs, comprising a total length of


∼26.2 Mb (Supplementary Table 1). Comparing genomic features of _P. cynomolgi_, _P. knowlesi_ and _P. vivax_ reveals many similarities, including GC content (mean GC content of 40.5%), 14


positionally conserved centromeres and the presence of intrachromosomal telomeric sequences (ITSs; GGGTT(T/C)A), which were discovered in the _P. knowlesi_ genome9 but are absent in _P.


vivax_ (Fig. 1, Table 1 and Supplementary Table 2). We annotated the _P. cynomolgi_ strain B genome using a combination of _ab initio_ gene prediction programs trained on high-quality data


sets and sequence similarity searches against the annotated _P. vivax_ and _P. knowlesi_ genomes. Not unexpectedly for species from the same monkey malaria clade, gene synteny along the 14


chromosomes is highly conserved, although numerous microsyntenic breaks are present in regions containing multigene families (Fig. 2 and Table 2). This genome-wide view of synteny in six


species of _Plasmodium_ also identified two apparent errors in existing public sequence databases: an inversion in chromosome 3 of _P. knowlesi_ and an inversion in chromosome 6 of _P.


vivax_. The _P. cynomolgi_ genome contains 5,722 genes, of which approximately half encode conserved hypothetical proteins of unknown function, as is the case in all the _Plasmodium_ genomes


sequenced to date. A maximum-likelihood phylogenetic tree constructed using 192 conserved ribosomal and translation- and transcription-related genes (Supplementary Fig. 1) confirms the


close relationship of _P. cynomolgi_ to _P. vivax_ compared to five other _Plasmodium_ species. Approximately 90% of genes (4,613) have reciprocal best-match orthologs in all three species


(Fig. 3), enabling refinement of the existing _P. vivax_ and _P. knowlesi_ annotations (Supplementary Table 3). The high degree of gene orthology enabled us to identify specific examples of


gene duplication (an important vehicle for genome evolution), including a duplicated homolog of _P. vivax Pvs28_—which encodes a sexual stage surface antigen that is a transmission-blocking


vaccine candidate13—in _P. cynomolgi_ (Supplementary Table 4). Genes common only to _P. cynomolgi_ and _P. vivax_ (_n_ = 214) outnumber those that are restricted to _P. cynomolgi_ and _P.


knowlesi_ (_n_ = 100) or _P. vivax_ and _P. knowlesi_ (_n_ = 17). Such figures establish the usefulness of _P. cynomolgi_ as a model species for studying the more intractable _P. vivax._


Notably, most of the genes specific to a particular species belong to multigene families (excluding hypothetical genes; Table 2 and Supplementary Table 5). This suggests repeated


lineage-specific gene duplication and/or gene deletion in multigene families within the three monkey malaria clade species. Moreover, copy numbers of the genes composing multigene families


were generally greater in the _P. cynomolgi–P. vivax_ lineage than in _P. knowlesi_, suggesting repeated gene duplication in the ancestral lineage of _P. cynomolgi_ and _P. vivax_ (or


repeated gene deletion in the _P. knowlesi_ lineage). Thus, the genomes of _P. cynomolgi_, _P. vivax_ and _P. knowlesi_ can largely be distinguished by variations in the copy number of


multigene family members. Examples of such families include those that encode proteins involved in evasion of the human immune system (_vir_, _kir_ and _SICAvar_) and invasion of host red


blood cells (_dbp_ and _rbp_). In malaria-causing parasites, invasion of host erythrocytes, mediated by specific interactions between parasite ligands and erythrocyte receptors, is a crucial


component of the parasite lifecycle. Of great interest are the _ebl_ and _rbl_ gene families, which encode parasite ligands required for the recognition of host erythrocytes. The _ebl_


genes encode erythrocyte binding–like (EBL) ligands such as the Duffy-binding proteins (DBPs) that bind to Duffy antigen receptor for chemokines (DARC) on human and monkey erythrocytes. The


_rbl_ genes encode the reticulocyte binding–like (RBL) protein family, including reticulocyte-binding proteins (RBPs) in _P. cynomolgi_ and _P. vivax_, and normocyte-binding proteins (NBPs)


in _P. knowlesi_, which bind to unknown erythrocyte receptors14. We confirmed the presence of two _dbp_ genes in _P. cynomolgi_15 (Supplementary Table 6), in contrast to the one _dbp_ and


three _dbp_ genes identified in _P. vivax_ and _P. knowlesi_, respectively. This raises an intriguing hypothesis that _P. vivax_ lost one _dbp_ gene, and thus its infectivity of Old World


monkey erythrocytes, after divergence from a common _P. vivax_–_P. cynomolgi_ ancestor. This hypothesis is also supported by our identification of single-copy _dbp_ genes in two other


closely related Old World monkey malaria-causing parasites, _Plasmodium fieldi_ and _Plasmodium simiovale_, which are incapable of infecting humans16. These two Old World monkey species lost


one or more _dbp_ genes during divergence that confer infectivity to humans, whereas _P. cynomolgi_ and _P. knowlesi_ retained _dbp_ genes that allow invasion of human erythrocytes


(Supplementary Fig. 2). We found multiple _rbp_ genes, some truncated or present as pseudogenes, in the _P. cynomolgi_ genome (Fig. 1 and Table 2). Phylogenetic analysis showed that _rbl_


genes from _P. cynomolgi_, _P. vivax_ and _P. knowlesi_ can be classified into three distinct groups, RBP/NBP-1, RBP/NBP-2 and RBP/NBP-3 (Supplementary Fig. 3), and suggests that these


groups existed before the three species diverged. All three groups of RBP/NBP are represented in _P. cynomolgi_, whereas _P. vivax_ and _P. knowlesi_ lack functional genes from the RBP/NBP-3


and RBP/NBP-1 groups, respectively. Thus, _rbl_ gene family expansion seems to have occurred after speciation, indicating that the three species have multiple species-specific erythrocyte


invasion mechanisms. Notably, we found an ortholog of _P. vivax rbp1b_ in some strains of _P. cynomolgi_ but not in others (Supplementary Table 6). To our knowledge, this is the first


example of a CNV for a _rbp_ gene between strains of a single _Plasmodium_ species, highlighting how repeated creation and destruction of _rbl_ genes, a signature of adaptive evolution, may


have enabled species of the monkey malaria clade to expand or switch between monkey and human hosts. The largest gene family in _P. cynomolgi_, consisting of 256 _cyir_


(cynomolgi-interspersed repeat) genes, is part of the _pir_ (plasmodium-interspersed repeat) superfamily that includes _P. vivax vir_ genes (_n_ = 319) and _P. knowlesi kir_ genes (_n_ = 70)


(Table 2). _Pir_-encoded proteins reside on the surface of infected erythrocytes and have an important role in immune evasion17. Most _cyir_ genes have sequence similarity to _P. vivax vir_


genes (_n_ = 254; Supplementary Table 7) and are found in subtelomeric regions (Fig. 1), but, notably, 11 _cyir_ genes have sequence similarity to _P. knowlesi kir_ genes (Supplementary


Table 7) and occur more internally in the chromosomes, as do the _kir_ genes in _P. knowlesi_. As with 'molecular mimicry' in _P. knowlesi_ (mimicry of host sequences by pathogen


sequences)9, one CYIR protein (encoded by PCYB_032250) has a region of 56 amino acids that is highly similar to the extracellular domain of primate CD99 (Supplementary Fig. 4), a molecule


involved in the regulation of T-cell function. A new finding is that _P. cynomolgi_ has two genes whose sequences are similar to _P. knowlesi SICAvar_ genes (Supplementary Table 7) that are


expressed on the surfaces of schizont-infected macaque erythrocytes and are involved in antigenic variation18. The ability to form a dormant hypnozoite stage is common to both _P. cynomolgi_


and _P. vivax_ and was first shown in laboratory infections of monkeys by mosquito-transmitted _P. cynomolgi_19. In a search for candidate genes involved in the hypnozoite stage, we


identified nine coding for 'dormancy-related' proteins that had the upstream ApiAP2 motifs20 necessary for stage-specific transcriptional regulation at the sporozoite


(pre-hypnozoite) stage (Supplementary Table 8). The candidates include kinases that are involved in cell cycle transition; hypnozoite formation may be regulated by phosphorylation of


proteins specifically expressed at the pre-hypnozoite stage. Our list of _P. cynomolgi_ candidate genes represents an informed starting point for experimental studies of this elusive stage.


We sequenced _P. cynomolgi_ strains Berok (from Malaysia) and Cambodian (from Cambodia) to 26× and 17× coverage, respectively, to characterize _P. cynomolgi_ genome-wide diversity through


analysis of SNPs, CNVs and microsatellites. A comparison of the three _P. cynomolgi_ strains identified 178,732 SNPs (Supplementary Table 9) at a frequency of 1 SNP per 151 bp, a


polymorphism level somewhat similar to that found when _P. falciparum_ genomes are compared21,22. We calculated the pairwise nucleotide diversity (_π_) as 5.41 × 10−3 across the genome,


which varies little between the chromosomes. We assessed genome-wide CNVs between the _P. cynomolgi_ B and Berok strains, using a robust statistical model in the CNV-seq program23, by which


we identified 1,570 CNVs (1 per 17 kb), including 1 containing the _rbp1b_ gene on chromosome 7 (Supplementary Fig. 5). Finally, mining of the _P. cynomolgi_ B and Berok strains identified


182 polymorphic intergenic microsatellites (Supplementary Table 10), the first set of genetic markers developed for this species. These provide a toolkit for studies of genetic diversity and


population structure of laboratory stocks or natural infections of _P. cynomolgi_, many of which have recently been isolated from screening hundreds of wild monkeys for the zoonosis _P.


knowlesi_24. We estimated the difference between the number of synonymous changes per synonymous site (dS) and the number of nonsynonymous changes per nonsynonymous site (dN) over 4,563


pairs of orthologs within _P. cynomolgi_ strains B and Berok and 4,601 pairs of orthologs between these two _P. cynomolgi_ strains and _P. vivax_ Salvador I, using a simple Nei-Gojobori


model25. We found 63 genes with dN > dS within the two _P. cynomolgi_ strains and 3,265 genes with dS > dN (Supplementary Table 11). Genes with relatively high dN/dS ratios include


those encoding transmembrane proteins, such as antigens and transporters, among which is a transmission-blocking target antigen, Pcyn230 (encoded by PCYB_042090). Notably, the _P. vivax_


ortholog (PVX_003905) does not show evidence for positive selection26, suggesting species-specific positive selection. We explored the degree to which evolution of orthologs has been


constrained between _P. cynomolgi_ and _P. vivax_ and found 83 genes under possible accelerated evolution but 3,739 genes under possible purifying selection (Supplementary Table 12). This


conservative estimate indicates that at least 81% of loci have diverged under strong constraint, compared with 1.8% of loci under less constraint or positive selection (Fig. 1), indicating


that, overall, the genome of _P. cynomolgi_ is highly conserved in single-locus genes compared to _P. vivax_ and emphasizing the value of _P. cynomolgi_ as a biomedical and evolutionary


model for studying _P. vivax._ Our generation of the first _P. cynomolgi_ genome sequences is a critical step in the development of a robust model system for the intractable and neglected


_P. vivax_ species27. Comparative genome analysis of _P. vivax_ and the Old World monkey malaria-causing parasites _P. cynomolgi_ and _P. knowlesi_ presented here provides the foundation for


further insights into traits such as host specificity that will enhance prospects for the eventual elimination of vivax-caused malaria and global malaria eradication. URLS. PlasmoDB,


http://plasmodb.org/; Circos, http://circos.ca/; MIcroSAtelite Identification tool (MISA), http://pgrc.ipk-gatersleben.de/misa/; dbSNP,


http://www.ncbi.nlm.nih.gov/projects/SNP/snp_viewBatch.cgi?sbid=1056645. METHODS PARASITE MATERIAL. Details of the origin of the _P. cynomolgi_ B, Berok and Cambodian strains, their growth


in macaques and isolation of parasite material are given in the Supplementary Note. GENOME SEQUENCING AND ASSEMBLY. _P. cynomolgi_ B strain was sequenced using the Roche 454 GS FLX


(Titanium) and Illumina/Solexa Genome Analyzer IIx platforms to 161× coverage. In addition, 2,784 clones (6.8 Mb) of a ∼40-kb insert fosmid library in pCC1FOS (EpiCentre Biotechnologies) was


sequenced by the Sanger method. A draft assembly of strain B was constructed using a combination of automated assembly and manual gap closure. We first generated _de novo_ contigs by


assembling Roche 454 reads using GS _De novo_ Assembler version 2.0 with default parameters. Contigs of >500 bp were mapped to the _P. vivax_ Salvador I reference assembly12 (PlasmoDB;


see URLs). _P. cynomolgi_ contigs were iteratively arrayed through alignment to _P. vivax_–assembled sequences with manual corrections. A total of 1,264 aligned contigs were validated by


mapping paired-end reads from fosmid clones using blastn (_e_ <1 × 10−15; identity > 90%; coverage > 200 bp) implemented in GenomeMatcher software version 1.65 (ref. 28). Additional


linkages (699 regions) were made using PCR across the intervening sequence gaps with primers designed from neighboring contigs. The length of sequence gaps was estimated from insert lengths


of the fosmid paired-end reads, the size of PCR products and homologous sequences of the _P. vivax_ genome. Supercontigs were then manually constructed from the aligned contigs. Eventually,


we obtained 14 supercontigs corresponding to the 14 chromosomes of the parasite, with a total length of ∼22.73 Mb, encompassing ∼80% of the predicted _P. cynomolgi_ genome. A total of 1,651


contigs (>1 kb) with a total length of 3.45 Mb was identified as unassigned subtelomeric sequences by searching against the _P. vivax_ genome using blastn. Additionally, to improve


sequence accuracy, we constructed a mapping assembly of Illumina paired-end reads and the 14 supercontigs and unassigned contigs as reference sequences using CLC Genomics Workbench version


3.0 with default settings (CLC Bio). Comparison of the draft _P. cynomolgi_ B sequence with 23 _P. cynomolgi_ protein-coding genes (64 kb) obtained by Sanger sequencing showed 99.8% sequence


identity (Supplementary Table 13). The _P. cynomolgi_ Berok and Cambodian strains were sequenced to 26× and 17× coverage, respectively, using the Roche 454 GS FLX platform, with single-end


and 3-kb paired-end libraries made for the former and a single-end library only made for the latter. For phylogenetic analyses of specific genes, sequences were independently verified by


Sanger sequencing (Supplementary Table 14 and Supplementary Note). PREDICTION AND ANNOTATION OF GENES. Gene prediction for the 14 supercontigs and 1,651 unassigned contigs was performed


using the MAKER genome annotation pipeline29 with _ab initio_ gene prediction programs trained on proteins and ESTs from PlasmoDB Build 7.1. For gene annotation, blastn (_e_ <1 × 10−15;


identity > 70%; coverage > 100 bp) searches of _P. vivax_ (PvivaxAnnotatedTranscripts_PlasmoDB-7.1.fasta) and _P. knowlesi_ (PknowlesiAnnotatedTranscripts_PlasmoDB-7.1.fasta) predicted


proteomes were run, and the best hits were identified. All predicted genes were manually inspected at least twice for gene structure and functional annotation, and orthologous relationships


between _P. cynomolgi_, _P. vivax_ and _P. knowlesi_ were determined on synteny. A unique identifier, PCYB_######, was assigned to _P. cynomolgi_ genes, where the first two of the six


numbers indicate chromosome number. Paralogs of genes that seemed to be specific to either _P. cynomolgi_, _P. vivax_ or _P. knowlesi_ were searched using blastp with default parameters,


using a cutoff _e_ value of 1 × 10−16. MULTIPLE GENOME SEQUENCE ALIGNMENT. Predicted proteins of _P. cynomolgi_ B strain were concatenated and aligned with those from the 14 chromosomes of 5


other _Plasmodium_ genomes: _P. vivax_, _P. knowlesi_, _P. falciparum_, _P. berghei_ and _P. chabaudi_, using Murasaki software version 1.68.6 (ref. 30). SEARCH FOR SEQUENCE SHOWING HIGH


SIMILARITY TO HOST PROTEINS. Eleven _P. cynomolgi_ CYIR proteins (with sequence similarity to _P. knowlesi_ KIR) were subjected to blastp search for regions having high similarity to host


_Macacca mulatta_ CD99 protein, with cutoff _e_ value of 1 × 10−12 and compositional adjustment (no adjustment) against the nonredundant protein sequence data set of the _M. mulatta_


proteome in NCBI. PHYLOGENETIC ANALYSES. Genes were aligned using ClustalW version 2.0.10 (ref. 31) with manual corrections, and unambiguously aligned sites were selected for phylogenetic


analyses. Maximum-likelihood phylogenetic trees were constructed using PROML programs in PHYLIP version 3.69 (ref. 32) under the Jones-Taylor-Thornton (JTT) amino-acid substitution model. To


take the evolutionary rate heterogeneity across sites into consideration, the R (hidden Markov model rates) option was set for discrete γ distribution, with eight categories for


approximating the site-rate distribution. CODEML programs in PAML 4.4 (ref. 33) were used for estimating the γ shape parameter, α values. For bootstrap analyses, SEQBOOT and CONSENSE


programs in PHYLIP were applied. CANDIDATE GENES FOR HYPNOZOITE FORMATION. We undertook two approaches. First, genes unique to _P. vivax_ and _P. cynomolgi_ (hypnozoite-forming parasites)


and not found in other non-hypnozoite–forming _Plasmodium_ species were identified. We used the 147 unique genes identified in the _P. vivax_ genome12 to search the _P. cynomolgi_ B


sequence. For the orthologs identified in both species, ∼1 kb of sequence 5′ to the coding sequence was searched for four specific ApiAP2 motifs20—PF14 0633, GCATGC; PF13_0235_D1, GCCCCG;


PFF0670w_D1, TAAGCC; and PFD0985w_D2, TGTTAC—which are involved in sporozoite stage–specific regulation and expression (corresponding to the pre-hypnozoite stage). Second, dormancy-related


proteins were retrieved from GenBank and used to search for _P. vivax_ homologs. Candidate genes (_n_ = 128) and orthologs of _P. cynomolgi_ and five other parasite species were searched in


the region ∼1 kb upstream of the coding sequence for the presence of the four ApiAP2 motifs. Data for _P. vivax_, _P. knowlesi_, _P. falciparum_, _P. berghei_, _Plasmodium chabaudi_ and


_Plasmodium yoelii_ were retrieved from PlasmoDB Build 7.1. GENOME-WIDE SCREEN FOR POLYMORPHISMS. For SNP identification, alignment of Roche 454 data from strains B, Berok and Cambodian was


performed using SSAHA2 (ref. 34), with 0.1 mismatch rate and only unique matches reported. Potential duplicate reads generated during PCR amplification were removed, so that when multiple


reads mapped at identical coordinates, only the reads with the highest mapping quality were retained. We used a statistical method35 implemented in SAMtools version 0.1.18 to call SNPs


simultaneously in the case of duplicate runs of the same strain. SNPs with high read depth (>100) were filtered out, as were SNPs in poor alignment regions at the ends of chromosomes


(Supplementary Note). Nucleotide diversity (_π_) was calculated as follows. For each site being compared, we calculated allele frequency by counting the two alleles and measured the


proportion of nucleotide differences. Letting _π_ be the genetic distance between allele _i_ and allele _j_, then the nucleotide diversity within the population is where _P__i_ and _P__j_


are the overall allele frequencies of _i_ and _j_, respectively. Mean _π_ was calculated by averaging over sites, weighting each by , where _n_ is the number of aligned sites. Average dN/dS


ratios were estimated using the modified Nei-Gojobori/Jukes-Cantor method in MEGA 4 (ref. 36). CNV-seq23 was used to identify potential CNVs in _P. cynomolgi_. Briefly, this method is based


on a statistical model that allows confidence assessment of observed copy-number ratios from next-generation sequencing data. Roche 454 sequences from _P. cynomolgi_ strain B assembly were


used as the reference genome, and the _P. cynomolgi_ Berok strain was used as a test genome; the sequence coverage of the Cambodian strain was considered too low for analysis. The test reads


were mapped to the reference genome, and CNVs were detected by computing the number of reads for each test strain in a sliding window. The validity of the observed ratios was assessed by


the computation of a probability of a random occurrence, given no copy-number variation. Polymorphic microsatellites (defined as repeat units of 1–6 nucleotides) between _P. cynomolgi_


strains B and Berok were identified by aligning contigs from a _de novo_ assembly of Berok (generated using Roche GS Assembler version 2.6, with 40-bp minimum overlap, 90% identity) to the B


strain using the Burrows-Wheeler Aligner (BWA)37 and allowing for gaps. Using the Phred-scaled probability of the base being misaligned by SAMtools35, indel candidates were called from the


alignment. In-house Python scripts were used to then cross-reference with the microsatellites found in the reference strain B assembly identified by MISA (see URLs). All homopolymer


microsatellites were discarded to account for potential sequence errors introduced by 454 sequencing. Selective constraint analysis of 4,563 orthologs between _P. cynomolgi_ strains B and


Berok and 4,601 orthologs between these strains and _P. vivax_ Salvador I used MUSCLE38 alignments with stringent removal of gaps and missing data (_P. cynomolgi_ Berok orthologs were


identified through a reciprocal best-hit BLAST search against strain B genes). Analyses were conducted using the Nei-Gojobori model25. To detect values that could not be explained by chance,


we estimated the standard error by a bootstrap procedure with 200 pseudoreplicates for each gene. The expected value for dS/dN is 0 if a given pair of sequences is diverging without obvious


effects on fitness. In the case of the comparison within _P. cynomolgi_, values with a difference of ± 2 s.e.m. from 0 were considered indicative of an excess of synonymous (dS/dN > 0)


or nonsynonymous (dS/dN < 0) changes. In the case of the comparison between _P. cynomolgi_ and _P. vivax_, we used a more stringent criterion of ± 3 s.e.m. from 0. ACCESSION CODES.


Sequence data for the _P. cynomolgi_ B, Cambodian and Berok strains have been deposited in the DNA Data Bank of Japan (DDBJ), the European Molecular Biology Laboratory (EMBL) and the GenBank


databases under the following accessions: B strain sequence reads DRA000196, genome assembly BAEJ01000001 – BAEJ01003341 and annotation DF157093 – DF158755; Cambodian strain sequence reads


DRA000197; and Berok strain sequence reads SRA047950. SNP calls have been submitted to dbSNP (NYU_CGSB_BIO; 1056645) and may also be downloaded from the dbSNP website (see URLs). Sequences


of the _dbp_ genes from _P. cynomolgi_ (Cambodian strain), _P. fieldi_ (A.b.i. strain) and _P. simiovale_ (AB617788 – AB617791) and the _P. cynomolgi_ Berok strain (JQ422035 – JQ422036) and


_rbp_ gene sequences from the _P. cynomolgi_ Berok and Cambodian strains (JQ422037 – JQ422050) have been deposited. A partial apicoplast genome of the _P. cynomolgi_ Berok strain has been


deposited (JQ522954). The _P. cynomolgi_ B reference genome is also available through PlasmoDB (see URLs). ACCESSION CODES PRIMARY ACCESSIONS DDBJ/GENBANK/EMBL * DRA000196 * DRA000197 NCBI


REFERENCE SEQUENCE * AB617788 * AB617791 * DF157093 * DF158755 * JQ422035 * JQ422036 * JQ422037 * JQ422050 * JQ522954 SEQUENCE READ ARCHIVE * SRA047950 REFERENCED ACCESSIONS NCBI REFERENCE


SEQUENCE * AB444108 * AB444123 * AY598140 REFERENCES * Mendis, K., Sina, B.J., Marchesini, P. & Carter, R. The neglected burden of _Plasmodium vivax_ malaria. _Am. J. Trop. Med. Hyg._


64, 97–106 (2001). Article  CAS  Google Scholar  * Mueller, I. et al. Key gaps in the knowledge of _Plasmodium vivax_, a neglected human malaria parasite. _Lancet Infect. Dis._ 9, 555–566


(2009). Article  CAS  Google Scholar  * Baird, J.K. Resistance to chloroquine unhinges vivax malaria therapeutics. _Antimicrob. Agents Chemother._ 55, 1827–1830 (2011). Article  CAS  Google


Scholar  * Rayner, J.C., Liu, W., Peeters, M., Sharp, P.M. & Hahn, B.H. A plethora of _Plasmodium_ species in wild apes: a source of human infection? _Trends Parasitol._ 27, 222–229


(2011). Article  Google Scholar  * Cornejo, O.E. & Escalante, A.A. The origin and age of _Plasmodium vivax_. _Trends Parasitol._ 22, 558–563 (2006). Article  Google Scholar  * Escalante,


A.A. et al. A monkey's tale: the origin of _Plasmodium vivax_ as a human malaria parasite. _Proc. Natl. Acad. Sci. USA_ 102, 1980–1985 (2005). Article  CAS  Google Scholar  * Mu, J. et


al. Host switch leads to emergence of _Plasmodium vivax_ malaria in humans. _Mol. Biol. Evol._ 22, 1686–1693 (2005). Article  CAS  Google Scholar  * Singh, B. et al. A large focus of


naturally acquired _Plasmodium knowlesi_ infections in human beings. _Lancet_ 363, 1017–1024 (2004). Article  Google Scholar  * Pain, A. et al. The genome of the simian and human malaria


parasite _Plasmodium knowlesi_. _Nature_ 455, 799–803 (2008). Article  CAS  Google Scholar  * Eyles, D.E., Coatney, G.R. & Getz, M.E. Vivax-type malaria parasite of macaques


transmissible to man. _Science_ 131, 1812–1813 (1960). Article  CAS  Google Scholar  * Gibbs, R.A. et al. Evolutionary and biomedical insights from the rhesus macaque genome. _Science_ 316,


222–234 (2007). Article  CAS  Google Scholar  * Carlton, J.M. et al. Comparative genomics of the neglected human malaria parasite _Plasmodium vivax_. _Nature_ 455, 757–763 (2008). Article 


CAS  Google Scholar  * Saxena, A.K., Wu, Y. & Garboczi, D.N. _Plasmodium_ p25 and p28 surface proteins: potential transmission-blocking vaccines. _Eukaryot. Cell_ 6, 1260–1265 (2007).


Article  CAS  Google Scholar  * Iyer, J., Gruner, A.C., Renia, L., Snounou, G. & Preiser, P.R. Invasion of host cells by malaria parasites: a tale of two protein families. _Mol.


Microbiol._ 65, 231–249 (2007). Article  CAS  Google Scholar  * Okenu, D.M., Malhotra, P., Lalitha, P.V., Chitnis, C.E. & Chauhan, V.S. Cloning and sequence analysis of a gene encoding


an erythrocyte binding protein from _Plasmodium cynomolgi_. _Mol. Biochem. Parasitol._ 89, 301–306 (1997). Article  CAS  Google Scholar  * Coatney, G.R., Collins, W.E., Warren, M. &


Contacos, P.G. _The Primate Malarias_ (US Department of Health, Education and Welfare, Washington, DC, 1971). * Cunningham, D., Lawton, J., Jarra, W., Preiser, P. & Langhorne, J. The


_pir_ multigene family of _Plasmodium_: antigenic variation and beyond. _Mol. Biochem. Parasitol._ 170, 65–73 (2010). Article  CAS  Google Scholar  * al-Khedery, B., Barnwell, J.W. &


Galinski, M.R. Antigenic variation in malaria: a 3′ genomic alteration associated with the expression of a _P. knowlesi_ variant antigen. _Mol. Cell_ 3, 131–141 (1999). Article  CAS  Google


Scholar  * Krotoski, W.A. The hypnozoite and malarial relapse. _Prog. Clin. Parasitol._ 1, 1–19 (1989). CAS  PubMed  Google Scholar  * Campbell, T.L., De Silva, E.K., Olszewski, K.L.,


Elemento, O. & Llinas, M. Identification and genome-wide prediction of DNA binding specificities for the ApiAP2 family of regulators from the malaria parasite. _PLoS Pathog._ 6, e1001165


(2010). Article  Google Scholar  * Mu, J. et al. Genome-wide variation and identification of vaccine targets in the _Plasmodium falciparum_ genome. _Nat. Genet._ 39, 126–130 (2007). Article


  CAS  Google Scholar  * Volkman, S.K. et al. A genome-wide map of diversity in _Plasmodium falciparum_. _Nat. Genet._ 39, 113–119 (2007). Article  CAS  Google Scholar  * Xie, C. &


Tammi, M.T. CNV-seq, a new method to detect copy number variation using high-throughput sequencing. _BMC Bioinformatics_ 10, 80 (2009). Article  Google Scholar  * Lee, K.S. et al.


_Plasmodium knowlesi_: reservoir hosts and tracking the emergence in humans and macaques. _PLoS Pathog._ 7, e1002015 (2011). Article  CAS  Google Scholar  * Nei, M. & Gojobori, T. Simple


methods for estimating the numbers of synonymous and nonsynonymous nucleotide substitutions. _Mol. Biol. Evol._ 3, 418–426 (1986). CAS  PubMed  Google Scholar  * Doi, M. et al. Worldwide


sequence conservation of transmission-blocking vaccine candidate Pvs230 in _Plasmodium vivax_. _Vaccine_ 29, 4308–4315 (2011). Article  CAS  Google Scholar  * Carlton, J.M., Sina, B.J. &


Adams, J.H. Why is _Plasmodium vivax_ a neglected tropical disease? _PLoS Negl. Trop. Dis._ 5, e1160 (2011). Article  Google Scholar  * Ohtsubo, Y., Ikeda-Ohtsubo, W., Nagata, Y. &


Tsuda, M. GenomeMatcher: a graphical user interface for DNA sequence comparison. _BMC Bioinformatics_ 9, 376 (2008). Article  Google Scholar  * Cantarel, B.L. et al. MAKER: an easy-to-use


annotation pipeline designed for emerging model organism genomes. _Genome Res._ 18, 188–196 (2008). Article  CAS  Google Scholar  * Popendorf, K., Tsuyoshi, H., Osana, Y. & Sakakibara,


Y. Murasaki: a fast, parallelizable algorithm to find anchors from multiple genomes. _PLoS ONE_ 5, e12651 (2010). Article  Google Scholar  * Thompson, J.D., Higgins, D.G. & Gibson, T.J.


CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. _Nucleic Acids Res._ 22,


4673–4680 (1994). Article  CAS  Google Scholar  * Felsenstein, J. _PHYLIP, Phylogeny Inference Package_, 3.6a3 edn (University of Washington, Seattle, 2005). * Yang, Z. PAML 4: phylogenetic


analysis by maximum likelihood. _Mol. Biol. Evol._ 24, 1586–1591 (2007). Article  CAS  Google Scholar  * Ning, Z., Cox, A.J. & Mullikin, J.C. SSAHA: a fast search method for large DNA


databases. _Genome Res._ 11, 1725–1729 (2001). Article  CAS  Google Scholar  * Li, H. A statistical framework for SNP calling, mutation discovery, association mapping and population


genetical parameter estimation from sequencing data. _Bioinformatics_ 27, 2987–2993 (2011). Article  CAS  Google Scholar  * Tamura, K., Dudley, J., Nei, M. & Kumar, S. MEGA4: Molecular


Evolutionary Genetics Analysis (MEGA) software version 4.0. _Mol. Biol. Evol._ 24, 1596–1599 (2007). Article  CAS  Google Scholar  * Li, H. & Durbin, R. Fast and accurate short read


alignment with Burrows-Wheeler transform. _Bioinformatics_ 25, 1754–1760 (2009). Article  CAS  Google Scholar  * Edgar, R.C. MUSCLE: multiple sequence alignment with high accuracy and high


throughput. _Nucleic Acids Res._ 32, 1792–1797 (2004). Article  CAS  Google Scholar  Download references ACKNOWLEDGEMENTS We thank H. Sawai for suggestions on genome analysis, D. Fisher for


help with genome-wide evolutionary analyses and the NYU Langone Medical Center Genome Technology Core for access to Roche 454 sequencing equipment (funded by grant S10 RR026950 to J.M.C.


from the US National Institutes of Health (NIH)). Genome and phylogenetic analyses used the Genome Information Research Center in the Research Institute of Microbial Diseases at Osaka


University. This work was supported by grants from the Ministry of Education, Culture, Sports, Science and Technology of Japan (18073013, 18GS03140013, 20390120 and 22406012) to K.T., an NIH


grant (R01 GM080586) to A.A.E. and a Burroughs Wellcome Fund grant (1007398) and an NIH International Centers of Excellence for Malaria Research grant (U19 AI089676-01) to J.M.C. The


content is soley the responsibility of the authors and does not necessarily represent the official views of the NIH. AUTHOR INFORMATION Author notes * Shin-Ichiro Tachibana Present address:


Present address: Career-Path Promotion Unit for Young Life Scientists, Kyoto University, Kyoto, Japan., * Jane M Carlton and Kazuyuki Tanabe: These authors jointly directed this work.


AUTHORS AND AFFILIATIONS * Laboratory of Malariology, Research Institute for Microbial Diseases, Osaka University, Suita, Japan Shin-Ichiro Tachibana, Hajime Honma & Kazuyuki Tanabe *


Department of Biology, Center for Genomics and Systems Biology, New York University, New York, New York, USA Steven A Sullivan, Hyunjae R Kim, Patrick L Sutton, Rimma Shakhbatyan & Jane


M Carlton * Laboratory of Tropical Medicine and Parasitology, Institute of International Education and Research, Dokkyo Medical University, Shimotsuga, Japan Satoru Kawai * Genome


Information Research Center, Research Institute for Microbial Diseases, Osaka University, Suita, Japan Shota Nakamura, Naohisa Goto & Teruo Yasunaga * Department of Molecular


Protozoology, Research Institute for Microbial Diseases, Osaka University, Suita, Japan Nobuko Arisue, Nirianne M Q Palacpac, Hajime Honma, Masanori Yagi, Takahiro Tougan, Toshihiro Horii 


& Kazuyuki Tanabe * The Corporation for Production and Research of Laboratory Primates, Tsukuba, Japan Yuko Katakai * Department of Protozoology, Institute of Tropical Medicine (NEKKEN)


and Global COE (Centers of Excellence) Program, Nagasaki University, Nagasaki, Japan Osamu Kaneko * Department of Molecular and Cellular Parasitology, Graduate School of Medicine, Juntendo


University, Tokyo, Japan Toshihiro Mita * Department of Biomedical Chemistry, Graduate School of Medicine, The University of Tokyo, Tokyo, Japan Kiyoshi Kita * Tsukuba Primate Research


Center, National Institute of Biomedical Innovation, Tsukuba, Japan Yasuhiro Yasutomi * Center for Global Health, Centers for Disease Control and Prevention, Divison of Parasitic Diseases


and Malaria, Atlanta, Georgia, USA John W Barnwell * Center for Evolutionary Medicine and Informatics, The Biodesign Institute, Arizona State University, Tempe, Arizona, USA Ananias A


Escalante Authors * Shin-Ichiro Tachibana View author publications You can also search for this author inPubMed Google Scholar * Steven A Sullivan View author publications You can also


search for this author inPubMed Google Scholar * Satoru Kawai View author publications You can also search for this author inPubMed Google Scholar * Shota Nakamura View author publications


You can also search for this author inPubMed Google Scholar * Hyunjae R Kim View author publications You can also search for this author inPubMed Google Scholar * Naohisa Goto View author


publications You can also search for this author inPubMed Google Scholar * Nobuko Arisue View author publications You can also search for this author inPubMed Google Scholar * Nirianne M Q


Palacpac View author publications You can also search for this author inPubMed Google Scholar * Hajime Honma View author publications You can also search for this author inPubMed Google


Scholar * Masanori Yagi View author publications You can also search for this author inPubMed Google Scholar * Takahiro Tougan View author publications You can also search for this author


inPubMed Google Scholar * Yuko Katakai View author publications You can also search for this author inPubMed Google Scholar * Osamu Kaneko View author publications You can also search for


this author inPubMed Google Scholar * Toshihiro Mita View author publications You can also search for this author inPubMed Google Scholar * Kiyoshi Kita View author publications You can also


search for this author inPubMed Google Scholar * Yasuhiro Yasutomi View author publications You can also search for this author inPubMed Google Scholar * Patrick L Sutton View author


publications You can also search for this author inPubMed Google Scholar * Rimma Shakhbatyan View author publications You can also search for this author inPubMed Google Scholar * Toshihiro


Horii View author publications You can also search for this author inPubMed Google Scholar * Teruo Yasunaga View author publications You can also search for this author inPubMed Google


Scholar * John W Barnwell View author publications You can also search for this author inPubMed Google Scholar * Ananias A Escalante View author publications You can also search for this


author inPubMed Google Scholar * Jane M Carlton View author publications You can also search for this author inPubMed Google Scholar * Kazuyuki Tanabe View author publications You can also


search for this author inPubMed Google Scholar CONTRIBUTIONS K.T., J.M.C., A.A.E. and J.W.B. conceived and conducted the study. S.K., Y.K., Y.Y., S.-I.T. and J.W.B. provided _P. cynomolgi_


material. S.N., N.G., T.Y. and H.R.K. constructed a computing system for data processing, and S.-I.T., H.H., P.L.S., S.A.S. and H.R.K. performed scaffolding of contigs and manual annotation


of the predicted genes. S.N. performed sequence correction of supercontigs and gene prediction. S.-I.T., S.N., N.G., N.A., M.Y., O.K., K.T., H.R.K., R.S., S.A.S. and J.M.C. analyzed data.


S.-I.T., N.M.Q.P., T.T., T.M., K.K., J.M.C., T.H., A.A.E., J.W.B. and K.T. wrote the manuscript. CORRESPONDING AUTHORS Correspondence to Jane M Carlton or Kazuyuki Tanabe. ETHICS


DECLARATIONS COMPETING INTERESTS The authors declare no competing financial interests. SUPPLEMENTARY INFORMATION FOR _PLASMODIUM CYNOMOLGI_ GENOME SEQUENCES PROVIDE INSIGHT INTO _PLASMODIUM


VIVAX_ AND THE MONKEY MALARIA CLADE LIST OF ORTHOLOGS BETWEEN THE GENOMES OF _P. CYNOMOLGI, P. VIVAX_ AND _P. KNOWLESI_ (XLS 1812 KB) LIST OF MULTIGENE FAMILIES IN _P. CYNOMOLGI_, _P. VIVAX_


AND _P. KNOWLESI_ (XLS 114 KB) 41588_2012_BFNG2375_MOESM24_ESM.XLS List of _P. cynomolgi cyir_ genes, _P. vivax vir_ genes, _P. knowlesi kir_ genes, and _P. knowlesi SICAvar_ genes and


their homologs in _P. cynomolgi, P. vivax and P. knowlesi_ (XLS 159 kb) LIST OF POLYMORPHIC MICROSATELLITE LOCI IDENTIFIED BETWEEN _P. CYNOMOLGI_ STRAINS B AND BEROK. (XLSX 68 KB) DS-DN


WITHIN 4,605 ORTHOLOGS OF _P. CYNOMOLGI_ STRAINS B AND BEROK. (XLSX 346 KB) DS-DN BETWEEN 4,605 ORTHOLOGS OF _P. CYNOMOLGI_ STRAINS B AND BEROK, AND _P. VIVAX_ SALVADOR I. (XLSX 373 KB)


RIGHTS AND PERMISSIONS This article is distributed under the terms of the Creative Commons Attribution-Noncommercial-Share Alike license (http://creativecommons.org/licenses/by-nc-sa/3.0/),


which permits distribution, and reproduction in any medium, provided the original author and source are credited. This license does not permit commercial exploitation, and derivative works


must be licensed under the same or similar license. Reprints and permissions ABOUT THIS ARTICLE CITE THIS ARTICLE Tachibana, SI., Sullivan, S., Kawai, S. _et al._ _Plasmodium cynomolgi_


genome sequences provide insight into _Plasmodium vivax_ and the monkey malaria clade. _Nat Genet_ 44, 1051–1055 (2012). https://doi.org/10.1038/ng.2375 Download citation * Received: 25


January 2012 * Accepted: 09 July 2012 * Published: 05 August 2012 * Issue Date: September 2012 * DOI: https://doi.org/10.1038/ng.2375 SHARE THIS ARTICLE Anyone you share the following link


with will be able to read this content: Get shareable link Sorry, a shareable link is not currently available for this article. Copy to clipboard Provided by the Springer Nature SharedIt


content-sharing initiative