The 2013-2015 Ebola virus disease (EVD) epidemic is caused by the Makona variant of Ebola virus (EBOV). Early in the epidemic, genome sequencing provided insights into virus evolution and transmission, and offered important information for outbreak response. Here we analyze sequences from 232 patients sampled over 7 months in Sierra Leone, along with 86 previously released genomes from earlier in the epidemic. We confirm sustained human-to-human transmission within Sierra Leone and find no evidence for import or export of EBOV across national borders after its initial introduction. Using high-depth replicate sequencing, we observe both host-to-host transmission and recurrent emergence of intrahost genetic variants. We trace the increasing impact of purifying selection in suppressing the accumulation of nonsynonymous mutations over time. Finally, we note changes in the mucin-like domain of EBOV glycoprotein that merit further investigation. These findings clarify the movement of EBOV within the region and describe viral evolution during prolonged human-to-human transmission.
The 2013-2015 Western African Ebola virus disease (EVD) epidemic, caused by the Ebola virus (EBOV) Makona variant (Kuhn et al., 2014), is the largest EVD outbreak to date, with 26,648 cases and 11,017 deaths documented as of May 8, 2015 (WHO, 2015). The outbreak, first declared in March 2014 in Guinea and traced back to the end of 2013 (Baize et al., 2014), has also devastated the neighboring countries of Sierra Leone and Liberia, with additional cases scattered across the globe. Never before has an EBOV variant been transmitted among humans for such an sustained period of time.
Published EBOV Makona genomes from clinical samples obtained early in the outbreak in Guinea (three patients) and Sierra Leone (78 patients) (Baize et al., 2014; Gire et al., 2014), demonstrated that near-real-time sequencing could provide valuable information to researchers involved in the global outbreak response. Analysis of these genomes revealed that the outbreak likely originated from a single introduction into the human population in Guinea at the end of 2013 and was then sustained exclusively by human-to-human transmissions. Genomic sequencing further allowed the identification of numerous mutations emerging in the EBOV Makona genome over time. As a consequence, the evolutionary rate of the Makona variant over the timespan of the early phase of the outbreak could be estimated, and predictions made on the potential of this new EBOV variant to escape current candidate vaccines, therapeutics, and diagnostics (Kugelman et al., 2015).
While the insights gleaned from sequencing early in the outbreak informed public health efforts (Alizon et al., 2014; Stadler et al., 2014; Volz et al., 2014), the continued human-to-human spread of the virus raises questions about ongoing evolution and transmission of EBOV. Our laboratory teams in Sierra Leone, at Kenema (Kenema Government Hospital, KGH) and at Bo (US Centers for Disease Control and Prevention, CDC) continued to perform active diagnosis and surveillance in Sierra Leone following our initial study (Gire et al., 2014). After a 6-month delay of sample shipment due to regulatory uncertainty about inactivation protocols, we again began to determine EBOV genome sequences. We have sequenced samples at high depth and with technical replicates to characterize genetic diversity of EBOV both within (intrahost) and between (interhost) individuals. To support global outbreak termination efforts, we publicly released these genomes prior to publication as they were generated, starting with a first set of 45 sequences in December 2014 and continuing with regular releases of hundreds of sequences through May 2015.
Here we provide an analysis of 232 new, coding-complete EBOV Makona genomes from Sierra Leone. We compared these genomes to 86 previously available genomes: 78 unique genomes from Sierra Leone (Gire et al., 2014), 3 genomes from Guinea (Baize et al., 2014), and 5 from health-care workers infected in Sierra Leone and treated in Europe. We use this combined data set obtained from 318 EVD patients during the height of the epidemic in Sierra Leone and Guinea to better understand EBOV transmission within Sierra Leone and between countries. In addition, we use it to understand viral population dynamics within individual hosts, the impact of natural selection, and the characteristics of the now hundreds of new mutations that have emerged over the longer course of the epidemic.
##232 New Ebola Virus Makona Genomes from Sierra Leone
We performed massively parallel genome sequencing on 673 samples from two EVD patient cohorts. The first cohort included 575 blood samples from 484 EVD patients confirmed by laboratory staff at KGH from June 16 through September 28, 2014. The second cohort included blood samples from 88 EVD patients from throughout Sierra Leone confirmed at Bo by CDC laboratory staff from August 20, 2014 through January 10, 2015. Samples from both EVD cohorts were sequenced using previously described methods (Experimental Procedures, Matranga et al. (2014), Gire et al. (2014)).
We implemented a new computational pipeline, viral-ngs:v1.0.0, for viral genomic de novo assembly, intrahost variant calling, and genome analysis and annotation. This pipeline is available via open-source software (Park et al., 2015) and utilizes a generalized workflow engine to run on a wide variety of computer hardware configurations (Koster et al., 2012). Through a partnership with DNAnexus, this pipeline is also available in a secure cloud-compute environment to enable consistent analyses across laboratories with limited computational resources (Experimental Procedures).
Using this pipeline, we successfully assembled 232 EBOV Makona coding-complete genomes (150 from KGH and 82 from the CDC cohort, spanning 16 Jun to 26 Dec 2014). Each assembled sequence was at least 18.5 kb in length with a maximum of 6% ambiguous base calls per genome. The median assembly had 374X coverage, was 18.9 kb long, and had no ambiguous bases. Despite extensive sequencing, successful full genome assembly was difficult to obtain from the KGH cohort (73% failed genome assemblies; 374X mean coverage; Table S1), compared to a previous cohort from the same laboratory, described in (Gire 2014) (11% failed genome assemblies; 2,000X mean coverage). The high assembly failure rate of the more recent KGH cohort is likely due to the mandatory in-country implementation of a new EVD sample deactivation protocol and to long delays for sample shipments amidst the outbreak response (see Experimental Procedures). In contrast, only 7% of samples from the CDC cohort failed to assemble. However, these samples had been pre-selected for sequencing based on high EBOV titers as estimated by qPCR. In addition, the CDC cohort samples were collected more recently, did not remain in lysis buffer for an extended period, and were subjected to a different sample deactivation protocol than the KGH cohort samples.
While we are continuing attempts to glean genomic information from compromised samples of the recent KGH cohort, important information may have been lost. In particular, samples from many EBOV-infected health-care workers at KGH, which could provide important insights into hospital-based transmissions, were compromised.
In combination with the 86 previously published EBOV Makona genomes Gire et al. (2014), we analyzed a total of 318 genomes (see Experimental Procedures), all aligned against the earliest sampled Guinean genome (GenBank KJ660346.2). In this set, we observed 464 single nucleotide polymorphisms (SNPs; 125 nonsynonymous, 176 synonymous, and 163 noncoding). We also observed five single-base insertions and two double-base insertions in noncoding regions. We mapped all of the variants to primer-binding sites for known sequence-based diagnostics (Kugelman et al., 2015) and found no mutations in these sites that were present in more than one Sierra Leonean sample (Table S2).
We constructed a second, independent genome library for each of 150 high-quality samples from the KGH cohort to reliably determine intrahost single nucleotide variants (iSNVs) at low frequencies (Gire et al., 2014). We identified 247 iSNVs (25 insertion/deletions that were excluded from all analyses, 73 nonsynonymous, 71 synonymous, and 78 noncoding), including 21 iSNVs shared by multiple patients.
Very recently, another 175 EBOV Makona genomes were published based on a cohort from Sierra Leone, mostly sampled from the area of Freetown in the Fall of 2014 (Tong et al., 2015). Although these data were not included in our analyses, they are unlikely to significantly alter our primary findings (Figure S1).
##Limited Ebola Virus Exchange Across the Sierra Leonean Border
A previous study of EBOV Makona sequences elucidated viral transmission and evolution during the early stages of the outbreak in Sierra Leone (Gire et al., 2014), from late May to early June, 2014. The first reported EVD cases in Sierra Leone stemmed from two genetically distinct EBOV Makona lineages, believed to have been introduced from Guinea. One of these lineages (SL1) was more closely related to the then-available three Guinean genomes (2-5 mutations) than the second lineage (SL2), which was characterized by 4 additional mutations. This finding suggested that SL2 had evolved from SL1 some months before it was observed in Sierra Leone. A third lineage (SL3), derived from SL2, emerged in mid-June 2014. SL3 differs from SL2 by a single mutation at position 10,218, first found as an intrahost variant (polymorphism within one individual) at a low frequency. SL3 became the most prevalent lineage in Sierra Leone during the first three weeks of the outbreak there, with SL1 disappearing soon after the appearance of SL3. The SL3-defining mutation is epidemiologically important, as it is the first commonly circulating mutation observed to arise within Sierra Leone's borders.
As the epidemic developed within Sierra Leone, the SL3 lineage continued to dominate the viral population within the country, with no evidence for additional imported EBOV lineages. In our data set, 97% of the genomes belong to SL3, and the remainder to SL2 (Figure 1A). These results link all Sierra Leonean EVD cases to the initial introduction of EBOV into Sierra Leone, and they provide further evidence that all EVD cases during this outbreak arose from human-to-human transmission rather than from further zoonotic introductions from the unknown EBOV reservoir. This means that no newly imported viral diversity was detected after the initial introduction (Gire et al., 2014); all newly sampled viruses likely descended from those sequenced in the initial weeks of the outbreak. The genetic similarity of these viruses suggests importation from other countries was minimal, although we cannot definitively rule out a re-introduction from elsewhere for the SL2 viruses (3%) in our data set.
Similarly, publicly available EBOV genomes from this outbreak can shed light on exportation of EBOV from Sierra Leone into other countries. All published genomes from elsewhere, including 26 from Liberia and 4 from Mali, lack the Sierra Leone-defining SL3 mutation (Figure 1B; Experimental Procedures). Given that 97% of Sierra Leonean EBOV sequences are SL3, extensive exportation would result in the spread of SL3 EBOV genomes, a spread that is not seen in the limited samples available to date. At least in Sierra Leone, and with the exception of events at the onset of the epidemic, transmission has likely been primarily within national borders (Figure S2, Experimental Procedures), rather than by free interchange with neighboring countries.
##Viral Evolution During a Prolonged EVD Epidemic
We previously reported that new mutations accumulated more rapidly in the viral population early in the outbreak than over the long-term in the reservoir (Gire et al., 2014). We hypothesized then that the higher rate early in the outbreak resulted from incomplete purifying selection, that is, that we were detecting transient nonsynonymous variants that would later be removed by purifying selection (Pybus et al., 2006; Bedford et al., 2011). The observed evolutionary rate is thus not an estimate of the underlying mutation rate, since some deleterious mutations are purged by selection before they can be detected. But neither is it an estimate of the long-term substitution rate, since other deleterious mutations have not been eliminated by selection at the time of analysis. We hypothesized that the EBOV Makona evolutionary rate would decline following the addition of genomes covering a longer evolutionary time scale. Such a decline is well-characterized in other species (Duchene et al., 2014; Ho et al., 2005). With the present data set we were able to examine the evolution of the virus over a longer time period. We found that the most probable estimated evolutionary rate of EBOV Makona is indeed markedly lower (mean posterior rate = 1.25 × 10-3 substitutions per site per year), and closer to the long-term rate than to the rate estimated early in the outbreak (Figure 3A, S4).
How purifying selection acts at different time scales can also be seen in the distribution of mutations in the EBOV Makona genealogy. Deleterious mutations are more likely to result in transmission-impaired viruses and dead-end infections, and may therefore only be present in individual patients. Mutations unique to individual patients are those that occur on the external branches of the phylogenetic tree, whereas internal branch mutations are those present in multiple samples in our data set. Thus, in the model of incomplete purifying selection, we expect external branches to be characterized by a higher rate of nonsynonymous substitution than internal branches; in the latter, selection has had more opportunity to filter out deleterious mutants. Internal branches, by definition, have produced multiple descendent lineages and are thus less likely to include mutations with fitness costs. To test this hypothesis, we estimated the numbers of nonsynonymous and synonymous changes on the virus genealogy and recovered their accumulation rates (Figure 3B). Nonsynonymous mutations indeed occurred at lower frequency on internal than on external branches, suggesting that most are removed by purifying selection because of their fitness costs and hence represent evolutionary dead ends. Synonymous mutations, which likely have less impact on fitness, occurred at more comparable frequencies on internal and external branches.
The relationship between the effectiveness of purifying selection and its duration is also apparent in the overall pattern of nonsynomymous mutations in our data set. Selection filters the accumulation of coding variants in the EBOV genome (Figure 3C, 4A). Nonsynonymous mutations, which are more likely to be deleterious, occur as a decreasing fraction of coding mutations as we analyze longer timescales: intrahost variants > individual patients (external branches) > multiple patients (internal branches) > between outbreaks. The last, the fraction seen between outbreaks, represent the effect of long periods of evolution in the unknown EBOV reservoir. As selection acts to remove deleterious alleles over time, fewer nonsynonymous mutations can be detected. This pattern holds true across the EBOV Makona genome (Figure 4A).
Although we observe less constraint on nonsynonymous changes during the 2013-2015 epidemic than between outbreaks, one anomaly is the genomic sequence encoding the mucin-like domain of the EBOV glycoprotein (GP), for which we observe more nonsynonymous substitutions than expected under neutrality, both within and between EVD outbreaks. Selective pressure acting on a region can be estimated with the standard statistic dN/dS, which has an expected value of 1.0 for neutral evolution and less than 1 for purifying selection; in the mucin-like domain, the mean posterior dN/dS within this outbreak is 4.74, and between outbreaks is 1.44 (Figure 4A). GP is the only surface-exposed viral protein on EBOV virions, and as such it is the primary target of antibodies (Murin 2014). This finding therefore raises the possibility that antibodies might be driving diversifying selection and rapid evolution in this region. This observation is based on a very small number of substitutions (8 nonsynonymous and 4 synonymous within the outbreak), however, and is not statistically significant (posterior probability that dN/dS is elevated within-outbreak = 92.9%); the situation should be clarified as more sequencing becomes available. If diversifying selection is occurring here, then the observed changes are very unlikely to represent population-level selection for transmission among humans; this would only occur if previously infected individuals were frequently being exposed to new infections. Instead, we hypothesize that these changes represent within-host selection for EBOV to escape a developing humoral immune response.
To test the hypothesis that antibodies drive diversifying selection of GP, we looked for enrichment of mutations within B cell epitopes in that protein. Effective humoral immunity depends on antibody binding to specific B cell epitopes (Becquart 2014, Murin 2014). Using experimentally determined B cell epitopes, obtained from the Virus Pathogen Database and Analysis Resource (ViPR, (Pickett 2012)), we found that nonsynonymous mutations in GP do indeed occur more frequently in epitopes than expected by chance (Figure 4B). This correlation supports the hypothesis that humoral immunity exerts selective pressure on the virus, driving immune evasion via accumulation of nonsynonymous mutations within GP B cell epitopes.
Visual inspection identified a subset of sequences that are more likely to be B cell escape variants (Figure 4C). In particular, three sequences (e.g. G4955.1) had a threonine-to-alanine mutation at GP amino acid position 485, a conserved threonine which is required for in vivo protection by the 14G7 antibody (Olal 2012). Additionally, two sequences had short stretches of T-to-C mutations in GP (4 or more T-to-C mutations within a 200 nucleotide region, Figure 4C), both of which occur within B cell epitopes.
Similar patterns of excess T-to-C mutations within short regions were also observed by Tong et al. (Tong 2015). In our dataset of 318 genomes, 5 possessed obvious stretches of T-to-C mutations within short regions. We also tested more broadly whether excessive T-to-C mutation occurred in all sequences. We found a significant enrichment of T-to-C transitions relative to all other types of transitions (Figure 4D). To determine if viral sequence divergence is related to T-to-C transition enrichment, we compared relative T-to-C transition rates in sequences with stretches of T-to-C mutations (n=5) to the top 5% of remaining sequences by sequence divergence (n=15), and to the bottom 95% of sequences (n=298) (Figure 4E). While the sequences with T-to-C stretches showed the strongest T-to-C enrichment, we found moderate enrichment of T-to-C transitions in the 5% most divergent sequences.
Our findings from 232 EBOV Makona genomes sampled in Sierra Leone over 7 months during the 2013-2015 EVD outbreak in Western Africa demonstrate the value of continued sequencing throughout an epidemic. We tracked the movement of EBOV throughout Sierra Leone and determined the frequency of EBOV movement into and out of that country. Although it is not unlikely that the virus continued to cross the national borders of Sierra Leone throughout the epidemic, these observations suggest that, at least in late 2014, cross-border introductions were not an important factor in the development of the epidemic. We were unable, however, to draw any conclusions about export to Guinea, since few EBOV sequences from there are currently available.
The sequence data display EBOV Makona evolution in the context of prolonged human-to-human transmission, and provide an updated view of genomic diversity. Based on the rates of nonsynonymous and synonymous changes that are shared or are unique to an individual host, we concluded that purifying selection becomes increasingly effective over time, as it has more opportunity to remove deleterious mutants.
While the effects of purifying selection in this extended EVD outbreak are clear, these evolutionary changes do not imply that positive selection or adaptation to humans are occurring. Rather, the data suggest that evolutionary changes over time through natural selection is sufficient to remove newly arisen alleles that are less fit in the human environment. To date, no published study has found experimental evidence of selection for alleles beneficial to the virus within the current outbreak.
It is important to recognize, however, that the long-term human-to-human transmission observed during the 2013-2015 EVD outbreak is historically unique for EBOV. At the beginning of each EVD outbreak, EBOV enters the human population with little or no genetic diversity. In the case of the current EVD outbreak, EBOV has now maintained fitness while expanding across a much larger space of genetic diversity than in previous EVD outbreaks, the largest of which comprised only 318 human infections. This degree of diversity will undoubtedly affect researchers' ongoing efforts to developing or improve improving candidate diagnostics, vaccines, and therapeutics for EVD, many of which are targeting EBOV sequences directly (PCR, nucleic-acid based therapeutics) or indirectly (antibody cocktails).
The mucin-like domain of the EBOV glycoprotein, in contrast to the rest of the EBOV genome, appeared to be under diversifying selection, based on a high ratio of nonsyonymous to synonymous mutations. While not statistically significant because of the small numbers of SNPs in the region, our observation is in agreement with many previous studies (Sanchez et al., 1998; Wertheim et al., 2009). As the EBOV GP, especially the mucin-like domain, is the target of many antibodies, a plausible hypothesis is that the humoral immune response exerts selective pressure on GP, resulting in an accumulation of nonsynonymous mutations. In support of this hypothesis, regions of GP corresponding to experimentally-determined B cell epitopes are significantly enriched in nonsynonymous, but not in synonymous, variants. There are two important caveats to this analysis: (1) these epitopes are determined in vitro and therefore may not be epitopes in vivo if they are not immunodominant, and (2) there is no experimental evidence to suggest that the majority of observed variants disrupt antibody binding to these epitopes.
While further experimental testing is required to validate an immune evasion hypothesis, we have highlighted a few prime candidates to consider. Genomes from three samples share a threonine-to-alanine mutation at GP amino acid position 485, a position that is conserved across all species of the Ebolavirus genus. This position is indispensable for binding of the protective antibody 14G7 (Olal 2012); the observed variant at this site may therefore be the result of escape from antibody-mediated selection. Additionally, two samples each possess multiple mutations within a single experimental B cell epitope in GP, which are likely to evade antibody recognition if those regions are relevant epitopes in vivo.
Intriguingly, the two samples with multiple mutations within a single B cell epitope each possess a distinct short stretch littered with T-to-C transitions, a phenomenon also observed in Tong et al (Tong 2015). Excessive T-to-C and A-to-G mutation of virus genomes has been observed previously as a result of adenosine deaminases acting on RNA (ADARs, (Gelinas 2011, Zahn 2006, Carpenter 2009)). When acting on viral genomic RNA, ADARs cause a pattern of excess A-to-G transitions that are represented by T-to-C transitions in our data set. These transitions are known to occur either promiscuously within 200 nucleotide stretches, or in a sequence-specific manner; therefore, we investigated both possibilities. While only five of the 318 sequences in our dataset contained obvious T-to-C stretches, we showed that the top 5% of sequences by sequence divergence, excluding the 5 sequences with T-to-C stretches, were also moderately enriched for T-to-C transitions across the genome. The remaining 95% of sequences appeared to show no enrichment. We do not know if this phenomenon is caused by ADAR acting upon genomic RNA, as we cannot exclude the possibility of bias by the EBOV RNA polymerase or other effects. Additionally, it is yet unclear if these T-to-C mutations have an anti-viral or other effect on viral fitness. These questions open avenues of research into molecular mechanisms shaping EBOV evolution.
The results of some of the specific genome analysis methods that we introduced here, while promising, will require denser EBOV genome sampling to yield sufficient information to influence the EVD outbreak response. Among these methods is transmission analysis, which could prove valuable for improved understanding of hospital-based transmissions and therefore for improved infection control. Inference of the ancestral genetic state is often straightforward, with clear patterns of new variations layering on previously existing variations; viruses that appear to be descended from others in the same data set are separated only by new mutations that are seen nowhere else in the data set. This kind of genetic relationship does not guarantee a transmission relationship between two patients, since many viruses can share identical genomes. However, since viruses with identical genomes are often epidemiologically related (Gire et al., 2014), we can infer that viruses that appear to descend from other viruses in our data set are either in or epidemiologically close to the same transmission chain.
Unfortunately, long delays of shipping samples from the field and required changes to the EBOV inactivation protocol caused severe degradation of many samples, which prevented identification of variants and transmission analysis. This loss should serve as a reminder that standardized and optimized protocols for sample collection, virus deactivation, and shipment are crucial for a rapid worldwide response to any new infectious disease outbreak. An important future research effort will be aimed at understanding which certified EVD sample deactivation protocols are best suited for high-quality genomic sequencing. Complications with sample shipment also emphasize the need for establishing in-country sequencing capabilities either before or at the onset of future EVD outbreaks (Folarin et al., 2014). .
Beyond coordinated field and experimental responses, a culture of rapid data sharing is critical for teams around the world to have the best current information about a circulating virus or ongoing disease (Yozwiak et al., 2015). In light of this need, we released all data discussed in this paper publicly as they were generated, beginning in December 2014, well in advance of our own analysis. We have previously described our high-depth sequencing protocols (Matranga et al., 2014) and we are also now making available our computational analysis pipeline, in the hope that they will assist the many laboratories engaged in viral genomic research. As more EBOV genomic data become available, in particular for poorly covered Liberia and Guinea, the scientific community can together obtain a broader picture of transmission and evolution of EBOV Makona during the EVD epidemic.
##Sample Preparation from Kenema Government Hospital
This study included 575 blood samples from 84 patients with confirmed EVD from June 16 through September 28, 2014 by KGH laboratory staff. Clinical samples were inactivated using Qiagen AVL and ethanol in the KGH laboratory prior to shipping out of the country.
##Sample Preparation from CDC Bo Laboratory
This study included 98 blood samples from 98 patients with confirmed EVD from August 20, 2014 through January 10, 2015 by CDC laboratory staff stationed in Bo, Sierra Leone. Clinical specimens from the CDC Bo laboratory in Sierra Leone were shipped to and stored at the Viral Special Pathogens Branch BSL-4 laboratory at CDC in Atlanta, GA. Samples were inactivated, and RNA was extracted using the MagMAX Pathogen RNA/DNA isolation kit (Invitrogen) and BeadRetriever (Invitrogen). Non-infectious RNA was treated with DNase I RNase-free (Roche) prior to shipment to the Broad Institute.
Host ribosomal and carrier poly(rA) RNA depletion, randomly-primed cDNA synthesis, Nextera XT library construction and 101-bp paired-end Illumina sequencing were performed as described previously (Gire et al., 2014; Matranga et al., 2014).
##Ebola Virus Makona Genome Assembly and Analysis
EBOV Makona genomes were assembled from high-throughput sequencing data using an updated bioinformatics pipeline based on our previously described methods (Gire et al., 2014; Matranga et al., 2014). Of the collected samples, 150 KGH and 82 CDC samples were used to accumulate sufficient EBOV genome sequencing coverage for high quality de novo genome assembly. Further description of the pipeline can be found in the Extended Experimental Procedures.
Our Linux-based software pipeline is publicly available at https://github.com/broadinstitute/viral-ngs (Park et al., 2015). This pipeline includes command-line tools for each of the above steps and optional Snakemake workflows (Koster et al., 2012) to automate them either sequentially or in parallel.
The assembly pipeline is also available via the DNAnexus cloud platform. RNA paired-end reads from either HiSeq or MiSeq instruments (Illumina) can be securely uploaded in FASTQ or BAM format and processed through the pipeline using graphical and command-line interfaces. Instructions for the cloud analysis pipeline are available at https://github.com/dnanexus/viral-ngs/wiki
##Genomic Epidemiology of Ebola virus Makona
The following publicly available EBOV Makona genomes from outside of Sierra Leone do not carry the SL3-derived allele at position 10,218: 26 available genomes from Liberia (25 from Kugelman et al. (2015), 1 from GenBank KP178538.1) and all 4 available genomes from Mali (Hoenen et al., 2015). A median-joining haplotype network was constructed in PopART version 1.7.2 (http://popart.otago.ac.nz). Due to the presence of missing data, 1,492 sites (7.9% of total genome) were excluded from the analysis; these sites included 61 sites with variability among isolates (10.9% of all variable sites).
To reconstruct the EBOV Makona transmission history within Sierra Leone, we grouped samples into sets of one or more genetically identical viruses based on their consensus sequences. We then identified relationships between these groups, progressing from the Guinean reference genome (KJ660346.2) and ending with nine viruses sampled in Freetown (eight from our KGH and CDC cohorts and one sequenced in Italy).
##Intrahost Variant Analysis
Full details of the identification and calling of intrahost variants (iSNVs) are available in Extended Experimental Procedures; iSNV calls and analyses are available in Data S1. Evolutionary distances between pairs of phylogeny tips were computed from the posterior sample of trees produced by Bayesian evolutionary analysis by sampling trees (BEAST) (Drummond et al., 2012) analysis. This calculation integrates across phylogenetic uncertainty and produces a temporal evolutionary distance between phylogeny tips. We used this distance matrix to calculate the average distance between pairs of phylogeny tips that share an iSNV and compared the result to the average distance between random pairs of tips. We calculated a p value for the observed average distance by conducting a randomization test. In each random replicate, we sampled the same distribution of iSNV possessing tips as observed in the empirical data, and calculated the average distance between these pairs of tips. We calculated a p value by comparing the empirical mean distance to the mean distances observed over 10,000 random replicates.
Data was obtained from the NIAID Virus Pathogen Database and Analysis Resource (ViPR) online through the web site at http://www.viprbrc.org (Pickett 2012). As most of the epitopes in the database are based on the Mayinga reference strain, we mapped all B cell epitopes against the Guinean reference strain (GenBank KJ660346.2), and removed all epitopes which no longer matched perfectly, leaving 40 B cell epitopes. Overlapping epitopes were merged, and the nonsynonymous and synonymous consensus SNPs and iSNVs were scored as within or outside of epitope regions with bedtools. Significance was determined by 2-tailed binomial test with α = 0.05, with the null hypothesis that variants would occur in epitope regions of GP by chance with a probability of 172/676, or, the fraction of residues of GP in a B cell epitope.
Three data sets were constructed to represent 3 time scales of genetic surveillance of EBOV Makona. For surveillance between EVD outbreaks, 63 publicly available sequences representing the diversity of EBOV sampled over long periods of time, including the first recorded, 1976 EVD outbreak and other EVD outbreaks and excluding one outbreak occurring in the Democratic Republic of the Congo in 2014. We also included EBOV genome fragment sequences from possibly infected great ape carcasses and frugivorous bats. Fourteen sequences from Western Africa were chosen to represent the current 2013-2015 EVD outbreak. For surveillance of the early outbreak, 81 sequences (Baize et al., 2014; Gire et al., 2014) were reanalyzed, representing the earliest epidemiologically-relevant and publicly available sequences. For surveillance of the prolonged epidemic, 232 EBOV genomes reported here were combined with 5 sequences from repatriated health-care workers (UK1, UK2, UK3, INMI1, GE1) and the 81 sequences from the early outbreak data set.
Analyses of rates, phylogenies, and evolution were performed on all three data sets in BEAST (Drummond et al., 2012). Full details on the models and parameters are available in Extended Experimental Procedures. All BEAST inputs, outputs, and analysis scripts are available in Data S2.
The contributions of each author are too extensive to list in detail. But among the first five and last four authors, A.G. and R.F.G. collected samples. A.G. and S.L.M.W. processed samples for sequencing. D.J.P, G.D., S.W., S.L.M.W, and A.R. analyzed sequence data. D.J.P, G.D., S.W., S.L.M.W., U.S., A.R., and P.C.S. wrote the paper. U.S., A.R., R.F.G., and P.C.S. jointly supervised this work.
We thank KGH staff who died of EVD (including M. Fonnie, A. Moigboi, A. Kovoma, M. Fullah, and S. H. Khan), the Office of the President of Sierra Leone (President E. Koroma, M. Jones), the Sierra Leone Ministry of Health and Sanitation, the Kenema District Health Management team, and the Kenema Lassa fever program for their immense efforts in the EVD outbreak response. We thank Public Health England (UK1, UK2, UK3), IRCCS Lazzaro Spallanzani (INMI1), and the University of Geneva (GE1) for providing EBOV genome sequences from samples of EVD patients exported from Sierra Leone. We thank the drivers, pilots, phlebotomists, non-governmental organizations, district medical officers, and district surveillance officers for their help with sample collection and logistics in Sierra Leone. We want to especially thank the Médecins Sans Frontières (MSF) operation centers for their continuing support of the US Centers for Disease Control and Prevention (CDC) laboratory in Bo, Sierra Leone, and the World Health Organization (WHO) for their support of the preceding CDC laboratory operation in Kenema, Sierra Leone.
This work was supported by European Union grant FP7/2007-2013 278433-PREDEMICS and European Research Council grant 260864 (A.R.); Natural Environment Research Council grant D76739X (G.D.); NIH U54 GM111274 (T.B.); NIH grant GM080177 (S.W.); NIH grant 1U01HG007480-01 (C.H.); National Science Foundation Graduate Research Fellowship Grant No. DGE 1144152 (A.E.L.); the National Health and Medical Research Council, Australia (E.C.H.); the Defense Threat Reduction Agency (USAMRIID); NIH/NIAID U19AI110818 (Broad Institute); the Bill and Melinda Gates Foundation OPP1123407 (Broad Institute); and NIAID HHSN272200900049C (Harvard/Tulane). This work was funded in part through Battelle Memorial Institute’s prime contract with the US National Institute of Allergy and Infectious Diseases (NIAID) under Contract No. HHSN272200700016I. Subcontractors to Battelle Memorial Institute who performed this work are: J.H.K., an employee of Tunnell Government Services, Inc. R.F.G. is co-founder of Zalgen Labs.
The Virus Pathogen Database and Analysis Resource (ViPR) has been wholly funded with federal funds from the National Institute of Allergy and Infectious Diseases, National Institutes of Health, Department of Health and Human Services, under Contract No. HHSN272201400028C.
This paper was authored in Authorea and its edit history is available here: https://www.authorea.com/users/10734/articles/19957.
The content of this publication does not necessarily reflect the views or policies of the the US Department of Health and Human Services (Centers for Disease Control and Prevention, National Institutes of Health) or the US Army.
Genome assemblies, annotations, and raw reads are available at NCBI on GenBank and SRA using the following BioProject IDs: PRJNA257197 (samples from Kenema Government Hospital) and PRJNA283385 (samples from CDC Bo Lab). Note that PRJNA257197 also includes all previously published data from Gire et al. (2014).
Samuel Alizon, Sébastien Lion, Carmen Lía Murall, Jessica L Abbate. Quantifying the epidemic spread of Ebola virus (EBOV) in Sierra Leone using phylodynamics. Virulence 5, 825–827 Informa UK Limited, 2014. Link
Sylvain Baize, Delphine Pannetier, Lisa Oestereich, Toni Rieger, Lamine Koivogui, NFaly Magassouba, Barrè Soropogui, Mamadou Saliou Sow, Sakoba Keïta, Hilde De Clerck, Amanda Tiffany, Gemma Dominguez, Mathieu Loua, Alexis Traoré, Moussa Kolié, Emmanuel Roland Malano, Emmanuel Heleze, Anne Bocquin, Stephane Mély, Hervé Raoul, Valérie Caro, Dániel Cadar, Martin Gabriel, Meike Pahlmann, Dennis Tappe, Jonas Schmidt-Chanasit, Benido Impouma, Abdoul Karim Diallo, Pierre Formenty, Michel Van Herp, Stephan Günther. Emergence of Zaire Ebola Virus Disease in Guinea. New England Journal of Medicine 371, 1418–1425 New England Journal of Medicine (NEJM/MMS), 2014. Link
Pierre Becquart, Tanel Mahlakõiv, Dieudonné Nkoghe, Eric M. Leroy. Identification of Continuous Human B-Cell Epitopes in the VP35 VP40, Nucleoprotein and Glycoprotein of Ebola Virus. PLoS ONE 9, e96360 Public Library of Science (PLoS), 2014. Link
Trevor Bedford, Sarah Cobey, Mercedes Pascual. Strength and tempo of selection revealed in viral gene genealogies. BMC Evolutionary Biology 11, 220 Springer Science \(\mathplus\) Business Media, 2011. Link
Jennifer A Carpenter, Liam P Keegan, Lena Wilfert, Mary A OConnell, Francis M Jiggins. Evidence for ADAR-induced hypermutation of the Drosophila sigma virus (Rhabdoviridae). BMC Genetics 10, 75 Springer Science \(\mathplus\) Business Media, 2009. Link
Alexei J Drummond, Marc A Suchard, Dong Xie, Andrew Rambaut. Bayesian Phylogenetics with BEAUti and the BEAST 1.7. Molecular Biology and Evolution 29 (2012). Link
S. Duchene, E. C. Holmes, S. Y. W. Ho. Analyses of evolutionary dynamics in viruses are hindered by a time-dependent bias in rate estimates. Proceedings of the Royal Society B: Biological Sciences 281, 20140732–20140732 The Royal Society, 2014. Link
Onikepe A Folarin, Anise N Happi, Christian T Happi. Empowering African genomics for infectious disease control. Genome Biology 15 Springer Science \(\mathplus\) Business Media, 2014. Link
J.-F. Gelinas, G. Clerzius, E. Shaw, A. Gatignol. Enhancement of Replication of RNA Viruses by ADAR1 via RNA Editing and Inhibition of RNA-Activated Protein Kinase. Journal of Virology 85, 8460–8466 American Society for Microbiology, 2011. Link
S. K. Gire, A. Goba, K. G. Andersen, R. S. G. Sealfon, D. J. Park, L. Kanneh, S. Jalloh, M. Momoh, M. Fullah, G. Dudas, S. Wohl, L. M. Moses, N. L. Yozwiak, S. Winnicki, C. B. Matranga, C. M. Malboeuf, J. Qu, A. D. Gladden, S. F. Schaffner, X. Yang, P.-P. Jiang, M. Nekoui, A. Colubri, M. R. Coomber, M. Fonnie, A. Moigboi, M. Gbakie, F. K. Kamara, V. Tucker, E. Konuwa, S. Saffa, J. Sellu, A. A. Jalloh, A. Kovoma, J. Koninga, I. Mustapha, K. Kargbo, M. Foday, M. Yillah, F. Kanneh, W. Robert, J. L. B. Massally, S. B. Chapman, J. Bochicchio, C. Murphy, C. Nusbaum, S. Young, B. W. Birren, D. S. Grant, J. S. Scheiffelin, E. S. Lander, C. Happi, S. M. Gevao, A. Gnirke, A. Rambaut, R. F. Garry, S. H. Khan, P. C. Sabeti. Genomic surveillance elucidates Ebola virus origin and transmission during the 2014 outbreak. Science 345, 1369–1372 American Association for the Advancement of Science (AAAS), 2014. Link
Simon Y. W. Ho, Matthew J. Phillips, Alan Cooper, Alexei J. Drummond. Time Dependency of Molecular Rate Estimates and Systematic Overestimation of Recent Divergence Times. Molecular Biology and Evolution 22, 1561–1568 Oxford University Press (OUP), 2005. Link
T. Hoenen, D. Safronetz, A. Groseth, K. R. Wollenberg, O. A. Koita, B. Diarra, I. S. Fall, F. C. Haidara, F. Diallo, M. Sanogo, Y. S. Sarro, A. Kone, A. C. G. Togo, A. Traore, M. Kodio, A. Dosseh, K. Rosenke, E. de Wit, F. Feldmann, H. Ebihara, V. J. Munster, K. C. Zoon, H. Feldmann, S. Sow. Mutation rate and genotype variation of Ebola virus from Mali case sequences. Science 348, 117–119 American Association for the Advancement of Science (AAAS), 2015. Link
Hossein Khiabanian, Kevin J. Emmett, Albert Lee, Raul Rabadan. High-resolution Genomic Surveillance of 2014 Ebolavirus Using Shared Subclonal Variants. PLoS Currents Outbreaks Public Library of Science (PLoS), 2015. Link
J. Koster, S. Rahmann. Snakemake–a scalable bioinformatics workflow engine. Bioinformatics 28, 2520–2522 Oxford University Press (OUP), 2012. Link
Jeffrey R. Kugelman, Mariano Sanchez-Lockhart, Kristian G. Andersen, Stephen Gire, Daniel J. Park, Rachel Sealfon, Aaron E. Lin, Shirlee Wohl, Pardis C. Sabeti, Jens H. Kuhn, Gustavo F. Palacios. Evaluation of the Potential Impact of Ebola Virus Genomic Drift on the Efficacy of Sequence-Based Candidate Therapeutics. mBio 6, e02227–14 American Society for Microbiology, 2015. Link
Jeffrey R. Kugelman, Michael R. Wiley, Suzanne Mate, Jason T. Ladner, Brett Beitzel, Lawrence Fakoli, Fahn Taweh, Karla Prieto, Joseph W. DiClaro, Timothy Minogue, Randal J. Schoepp, Kurt E. Schaecher, James Pettitt, Stacey Bateman, Joseph Fair, Jens H. Kuhn, Lisa Hensley, Daniel J. Park, Pardis C. Sabeti, Mariano Sanchez-Lockhart, Fatorma K. Bolay, Gustavo Palacios. Monitoring of Ebola Virus Makona Evolution through Establishment of Advanced Genomic Capability in Liberia. Emerging Infectious Diseases 21 Centers for Disease Control and Prevention (CDC), 2015. Link
Jens Kuhn, Kristian Andersen, Sylvain Baize, Yīmíng Bào, Sina Bavari, Nicolas Berthet, Olga Blinkova, J. Brister, Anna Clawson, Joseph Fair, Martin Gabriel, Robert Garry, Stephen Gire, Augustine Goba, Jean-Paul Gonzalez, Stephan Günther, Christian Happi, Peter Jahrling, Jimmy Kapetshi, Gary Kobinger, Jeffrey Kugelman, Eric Leroy, Gael Maganga, Placide Mbala, Lina Moses, Jean-Jacques Muyembe-Tamfum, Magassouba NFaly, Stuart Nichol, Sunday Omilabu, Gustavo Palacios, Daniel Park, Janusz Paweska, Sheli Radoshitzky, Cynthia Rossi, Pardis Sabeti, John Schieffelin, Randal Schoepp, Rachel Sealfon, Robert Swanepoel, Jonathan Towner, Jiro Wada, Nadia Wauquier, Nathan Yozwiak, Pierre Formenty. Nomenclature- and Database-Compatible Names for the Two Ebola Virus Variants that Emerged in Guinea and the Democratic Republic of the Congo in 2014. Viruses 6, 4760–4799 MDPI AG, 2014. Link
Christian B Matranga, Kristian G Andersen, Sarah Winnicki, Michele Busby, Adrianne D Gladden, Ryan Tewhey, Matthew Stremlau, Aaron Berlin, Stephen K Gire, Eleina England, Lina M Moses, Tarjei S Mikkelsen, Ikponmwonsa Odia, Philomena E Ehiane, Onikepe Folarin, Augustine Goba, S Kahn, Donald S Grant, Anna Honko, Lisa Hensley, Christian Happi, Robert F Garry, Christine M Malboeuf, Bruce W Birren, Andreas Gnirke, Joshua Z Levin, Pardis C Sabeti. Enhanced methods for unbiased deep sequencing of Lassa and Ebola RNA viruses from clinical and biological samples. Genome Biology 15, 519 Springer Science \(\mathplus\) Business Media, 2014. Link
Charles D. Murin, Marnie L. Fusco, Zachary A. Bornholdt, Xiangguo Qiu, Gene G. Olinger, Larry Zeitlin, Gary P. Kobinger, Andrew B. Ward, Erica Ollmann Saphire. Structures of protective antibodies reveal sites of vulnerability on Ebola virus. Proc Natl Acad Sci USA 111, 17182–17187 Proceedings of the National Academy of Sciences, 2014. Link
Daniel Olal, Ana I. Kuehne, Shridhar Bale, Peter Halfmann, Takao Hashiguchi, Marnie L. Fusco, Jeffrey E. Lee, Liam B. King, Yoshihiro Kawaoka, John M. Dye, Erica Ollmann Saphire. Structure of an Antibody in Complex with Its Mucin Domain Linear Epitope That Is Protective against Ebola Virus. Journal of Virology 86, 2809–2816 (2012). Link
BE Pickett, EL Sadat, Y Zhang, JM Noronha, RB Squires, V Hunt, M Liu, S Kumar, S Zaremba, Z Gu, L Zhou, CN Larson, J Dietrich, EB Klem, RH Scheuermann. ViPR: an open bioinformatics database and analysis resource for virology research.. Nucleic Acids Res 40, D593-8 (2012).
O. G. Pybus, A. Rambaut, R. Belshaw, R. P. Freckleton, A. J. Drummond, E. C. Holmes. Phylogenetic Evidence for Deleterious Mutation Load in RNA Viruses and Its Contribution to Viral Evolution. Molecular Biology and Evolution 24, 845–852 Oxford University Press (OUP), 2006. Link
Anthony Sanchez, Sam G. Trappier, Ute Ströher, Stuart T. Nichol, Michael D. Bowen, Heinz Feldmann. Variation in the Glycoprotein and VP35 Genes of Marburg Virus Strains. Virology 240, 138–146 Elsevier BV, 1998. Link
Tanja Stadler, Denise Kühnert, David A. Rasmussen, Louis du Plessis. Insights into the Early Epidemic Spread of Ebola in Sierra Leone Provided by Viral Sequence Data. PLoS Currents Outbreaks Public Library of Science (PLoS), 2014. Link
Yi-Gang Tong, Wei-Feng Shi, Di Liu, Jun Qian, Long Liang, Xiao-Chen Bo, Jun Liu, Hong-Guang Ren, Hang Fan, Ming Ni, Yang Sun, Yuan Jin, Yue Teng, Zhen Li, David Kargbo, Foday Dafae, Alex Kanu, Cheng-Chao Chen, Zhi-Heng Lan, Hui Jiang, Yang Luo, Hui-Jun Lu, Xiao-Guang Zhang, Fan Yang, Yi Hu, Yu-Xi Cao, Yong-Qiang Deng, Hao-Xiang Su, Yu Sun, Wen-Sen Liu, Zhuang Wang, Cheng-Yu Wang, Zhao-Yang Bu, Zhen-Dong Guo, Liu-Bo Zhang, Wei-Min Nie, Chang-Qing Bai, Chun-Hua Sun, Xiao-Ping An, Pei-Song Xu, Xiang-Li-Lan Zhang, Yong Huang, Zhi-Qiang Mi, Dong Yu, Hong-Wu Yao, Yong Feng, Zhi-Ping Xia, Xue-Xing Zheng, Song-Tao Yang, Bing Lu, Jia-Fu Jiang, Brima Kargbo, Fu-Chu He, George F. Gao, Wu-Chun Cao, Yi-Gang Tong, Jun Qian, Yang Sun, Hui-Jun Lu, Xiao-Guang Zhang, Fan Yang, Yi Hu, Yu-Xi Cao, Yong-Qiang Deng, Hao-Xiang Su, Yu Sun, Wen-Sen Liu, Zhuang Wang, Cheng-Yu Wang, Zhao-Yang Bu, Zhen-Dong Guo, Liu-Bo Zhang, Wei-Min Nie, Chang-Qing Bai, Chun-Hua Sun, Yong Feng, Jia-Fu Jiang, George F. Gao. Genetic diversity and evolutionary dynamics of Ebola virus in Sierra Leone. Nature Nature Publishing Group, 2015. Link
Erik Volz, Sergei Pond. Phylodynamic Analysis of Ebola Virus in the 2014 Sierra Leone Epidemic. PLoS Currents Outbreaks Public Library of Science (PLoS), 2014. Link
WHO. Ebola Situation Reports. (2015). Link
J. O. Wertheim, M. Worobey. Relaxed Selection and the Evolution of RNA Virus Mucin-Like Pathogenicity Factors. Journal of Virology 83, 4690–4694 American Society for Microbiology, 2009. Link
Nathan L. Yozwiak, Stephen F. Schaffner, Pardis C. Sabeti. Data sharing: Make outbreak research open access. Nature 518, 477–479 Nature Publishing Group, 2015. Link
R. C. Zahn, I. Schelp, O. Utermohlen, D. von Laer. A-to-G Hypermutation in the Genome of Lymphocytic Choriomeningitis Virus. Journal of Virology 81, 457–464 American Society for Microbiology, 2006. Link