Ancient Clostridium DNA and variants of tetanus neurotoxins associated with human archaeological remains

Identification and assembly of C. tetani-related genomes from aDNA samples

To explore the evolution and diversity of C. tetani, we performed a large-scale search of the entire NCBI Sequence Read Archive (SRA; 10,432,849 datasets from 291,458 studies totaling ~18 petabytes; June 8, 2021) for datasets potentially containing C. tetani DNA signatures. Since typical homology-based search methods (e.g., BLAST29) could not be applied at such a large scale, we used the recently developed Sequence Taxonomic Analysis Tool (STAT)30 to search the SRA and identified 136 sequencing datasets possessing the highest total C. tetani DNA content (k-mer abundance >23,000 reads, k = 32 base pair fragments mapping to the C. tetani genome) (Fig. 1a and Supplementary Data 1). Our search identified 28 previously sequenced C. tetani genomes (which serve as positive controls), as well as 108 uncharacterized sequencing runs (79 of human origin) with high predicted levels of C. tetani DNA content. Unexpectedly, 76 (96.2%) of these are aDNA datasets collected from human archeological specimens (Fig. 1a), with the remaining three datasets being from modern human gut microbiome samples.

Fig. 1: Petabase-scale screen of the NCBI sequence read archive reveals C. tetani-related genomes in ancient human archeological samples.

a General bioinformatic workflow starting with the analysis of 43,620 samples from the NCBI sequence read archive. Each sample is depicted according to its C. tetani k-mer abundance (y axis) versus the natural log of the overall dataset size in megabases (x axis). A threshold was used to distinguish samples with high detected C. tetani DNA content, and these data points are colored by sample origin: modern C. tetani genomes (red), non-human (light blue), modern human (blue), ancient human (black). The pie chart displays a breakdown of identified SRA samples with a high abundance of C. tetani DNA signatures. The 38 aDNA samples predicted to contain C. tetani DNA were further analyzed as shown in the bioinformatic pipeline on the right. b Top—density plot of the percentage identities of all BLAST local alignments detected between acBins and reference genomes including C. tetani, C. cochlearium, and other Clostridium spp. Bottom—density plot of the checkM results for the 38 acBins including estimated completeness, contamination, and strain heterogeneity levels. Completeness and contamination levels are percentage values. c MapDamage damage rates (5’ C → T misincorporation frequency) for acBins (n = 38 biologically independent samples) subdivided by UDG treatment (none (n = 27), partial (n = 5), and full (n = 6)). Also shown are the damage rates for modern C. tetani genomes (n = 21 biologically independent samples). The boxplots depict the lower quartile, median, and upper quartile of the data, with whiskers extending to 1.5 times the interquartile range (IQR) above the third quartile or below the first quartile. d Damage plots for the top five acBins with the highest damage rates, and corresponding mtDNA damage plots. Shown is the frequency of C → T (red) and G → A (blue) misincorporations at the first and last 25 bases of sequence fragments. Increased misincorporation frequency at the edges of reads is characteristic of ancient DNA. Source data for (ad) are provided as a Source Data file.

These 76 ancient DNA datasets are sequencing runs derived from 38 distinct archeological samples, which include tooth samples from aboriginal inhabitants of the Canary Islands from the 7th to 11th centuries CE31, tooth samples from the Sanganji Shell Mound of the Jomon in Japan (~1044 BCE)32, Egyptian mummy remains from ~1879 BCE to 53 CE33, and ancient Chilean Chinchorro mummy remains from ~3889 BCE34 (Supplementary Data 2). The 38 aDNA samples vary in terms of sample type (31 tooth, 6 bone and 1 chest extract), burial practices (27 regular inhumation and 11 mummies), sequencing method (26 shotgun datasets and 12 bait-capture approaches), and DNA treatment (6 UDG-treated, 5 partial UDG-treated and 27 untreated samples), all of which needs to be considered for interpretation of downstream analysis (Supplementary Data 2).

Although these archeological samples are of human origin, STAT analysis of the 38 DNA samples predicted a predominantly microbial composition (~90% median across samples, Supplementary Fig. 1). The predominance of microbial DNA in ancient human tooth samples is expected and consistent with previous studies which have shown microbial DNA proportions as high as 95–99%13,17,18,35. C. tetani-related DNA was consistently abundant among predicted microbial communities, detected at 13.8% average relative abundance (Supplementary Fig. 1 and Supplementary Data 3). A total of 85 species were detected at >= 2% abundance in at least one sample (Supplementary Data 4). While 65 of these species have been associated with humans or animals, 20 species have an environment-specific origin, and provide an estimate of possible environmental microbial contamination that could aid in interpretation of results (Supplementary Data 4, Supplementary Fig. 2). Putative environment-specific microbes make up a low proportion of the microbially classified reads at levels <=10% for 33 samples, and <=5% for 24 samples (Supplementary Data 5). The three samples with the highest estimated proportions of reads from putative environment-specific microbes were Tenerife-012-Tooth, Vác-Mummy-Tissue, and Tenerife-013-Tooth (Supplementary Data 5). Also noteworthy is that M. tuberculosis and Y. pestis were detected (Supplementary Fig. 1) in several datasets associated with bait-capture sequencing of M. tuberculosis and Y. pestis from archeological samples36,37,38.

To further explore the putative C. tetani in aDNA samples, we performed metagenome assembly using MEGAHIT39 for each individual sample and taxonomically classified assembled contigs using both Kaiju40 and BLAST29 to identify those mapping unambiguously to C. tetani and not other bacterial species (Supplementary Data 6 and 7). A majority (73%) of the alignments between assembled contigs and reference C. tetani genomes had percentage identities exceeding 99% (Fig. 1b). Ninety percent of the alignments had percentage identities exceeding 90%, suggesting that a large fraction of assembled contigs are highly similar to regions of modern C. tetani genomes. Based on mapping of reads to the C. tetani chromosome, the 38 samples had a 1× percent coverage ranging from 28 to 94% (mean of 78.3%) and a 5× coverage ranging from 9 to 93% (mean of 57.5%) (Supplementary Data 2). A subset of 16 samples had a 1× C. tetani chromosome coverage exceeding 90%.

For each of the ancient DNA samples, we binned together all C. tetani-like contigs to result in 38 putative, ancient DNA-associated clostridial genome bins or “acBins”. We then performed QC analysis of each acBin using CheckM41 to estimate genome completeness and contamination (Fig. 1b and Supplementary Data 8). CheckM estimates genome completeness based on the detected presence of taxon-specific marker genes, and uses duplicated marker genes (if present) to estimate contamination and heterogeneity41. Eighteen acBins were more than 50% complete and 11 were more than 70% complete. Thirty-seven acBins had low (<10%) checkM contamination (Supplementary Data 8). acBins with higher genome completeness were associated with datasets produced by shotgun sequencing rather than capture methods, as these datasets had higher levels of C. tetani DNA content (Supplementary Fig. 3). We also examined the acBins for potential strain heterogeneity using two independent approaches: CheckM estimation (Supplementary Data 8) as well as quantification of per-base heterogeneity from mapped reads (Supplementary Data 2). These two metrics had a weak but significant correlation (r = 0.38, P = 0.019) (Supplementary Fig. 4a). Five strains (Sanganji-A2-Tooth, Chinchorro-Mummy-Bone, SLC-France-Tooth, Karolva-Tooth, Chincha-UC12-24-Tooth) were identified as possessing higher estimated levels of strain variation, but all were below 6% (CheckM) and 1.1% (average base heterogeneity).

A subset of C. tetani genomes from archeological samples are of ancient origin

Using the tools MapDamage242 and pyDamage43, we then examined the 38 acBins for elevated C → T misincorporation rates at the ends of molecules, a characteristic pattern of aDNA damage19,20. Since these patterns are known to be affected by UDG treatment, we examined damage rates separately for full UDG, partial UDG, and untreated samples (Fig. 1c). As expected, we observed the highest damage rates in the untreated samples, and the lowest damage rates in the full UDG-treated samples, indicating that the damage rates have been suppressed in some samples by UDG treatment. The damage rates calculated by MapDamage and PyDamage were highly similar with a Pearson correlation of r = 0.99 (Supplementary Data 2). Damage plots for all samples are shown in Supplementary Fig. 5 with additional data available in Supplementary Data 9 and 10.

Overall, seven acBins possessed a damage rate (5’ C → T misincorporation rate) exceeding 10%, which is indicative of aDNA21 (top 5 shown in Fig. 1d). In addition, all of the acBins except one (“Chincha-UC12-12-Tooth”) were verified by pyDamage as containing ancient contigs with q values < 0.01 (Supplementary Data 10). The highest damage rate (17.9%) occurred in the acBin from the “Augsburg-Tooth” sample, which is the third oldest sample in our dataset (~2253 BCE), despite this sample being partially UDG-treated (Fig. 1d). As controls, evidence of ancient DNA damage was also observed in the corresponding human mitochondrial DNA (mtDNA) from the same ancient samples (Supplementary Fig. 5 and Supplementary Data 2), but not for modern C. tetani samples (Fig. 1c). In addition, no damage was detected in the three human gut-derived C. tetani bins identified by our screen.

In general, we observed a significant correlation between damage rates of acBin DNA and corresponding human mtDNA from the same sample (R2 = 0.38, P = 2.8E-03, two-sided Pearson) (Supplementary Fig. 6). However, acBin damage rates were generally lower than the corresponding human mtDNA rates, especially for some samples (e.g., Tenerife-004, Tenerife-013, Chinchorro-Mummy-Bone) (Supplementary Figs. 5 and 6), which may suggest that a subset of the archeological samples have been colonized by C. tetani at later dates (see “Discussion”). Damage rates were higher for noncapture datasets as these generally received no UDG treatment (Supplementary Fig. 7a), and higher for samples associated with regular inhumations than those from mummies (Supplementary Fig. 7b). We also observed a significant correlation between acBin damage level and sample age, but only for mummy-derived samples (R2 = 0.50, P = 0.014) (Supplementary Fig. 7c). Together, these data suggest that a subset of the acBins display evidence of ancient DNA damage and are plausibly of an ancient origin.

Identification of novel C. tetani lineages and a potentially new Clostridium species from ancient samples

To explore the phylogenetic relationships between the acBins and modern C. tetani strains, we first aligned their contigs to the reference C. tetani genome along with 41 existing, non-redundant C. tetani genomes10, and clustered the genomes to produce a dendrogram (Fig. 2a). Five acBins were omitted due to extremely low (<1%) genome coverage (see “Methods”), which could result in phylogenetic artifacts. We also included C. cochlearium as an outgroup, as it is the closest known related species to C. tetani based on phylogenomic analysis of available genomes44,45. Assessment of the genome-wide alignment for potential recombination showed no difference in estimated recombination levels for acBins compared to modern C. tetani genomes (Supplementary Fig. 8).

Fig. 2: Phylogenetic analysis reveals known and novel lineages of C. tetani in ancient DNA samples, as well as a previously unidentified Clostridium species (“X”).
figure 2

a Dendrogram depicting relationships of acBins from ancient samples with modern C. tetani genomes. Novel branches are labeled “X” and “Y”, which are phylogenetically distinct from existing C. tetani genomes. Shown on the right of the dendrogram are metadata and statistics associated with each acBin including the estimated date of the associated archeological sample. All metadata can be found in Supplementary Data 2. b Geographic distribution of ancient DNA samples from which the 38 acBins were identified. Each sample is colored based on the acBin clustering pattern shown in (a). The global map was derived from the Natural Earth ( medium-scale data and plotted using the rnaturalearth and ggplot2 R packages. c SNP-based phylogenetic tree of a subset of acBins from lineage 1 and 2 showing high similarity and coverage to the C. tetani reference genome. See Supplementary Fig. 9 for more details. Source data for (a, c) are provided as a Source Data file.

The genome-based dendrogram of the acBins and modern C. tetani strains (Fig. 2a) matches the expected phylogenetic structure and contains all previously established C. tetani lineages10. Ultimately, the acBins can be subdivided into those that cluster clearly within existing C. tetani lineages 1 or 2 and those that do not, which we have labeled “X” (8 acBins) and “Y” (1 acBin). Visualization of the acBin samples on the world map revealed a tendency for geographical clustering among acBins from the same phylogenetic lineage (Fig. 2b). For example, lineage 1H acBins originate from ancient samples collected in the Americas, whereas most lineage 2 acBins originate outside of the Americas, and most clade X samples originate in Europe (Fig. 2b). Interestingly, some samples from the same region (e.g., Canary Island samples, and Egyptian samples) contain diverse C. tetani lineages, which may be influenced by several factors (see “Discussion”).

acBins from C. tetani lineages 1 and 2

Twenty-four acBins fall within the C. tetani tree and possess average nucleotide identities (ANIs) of 96.4% to 99.7% to the E88 reference genome (Supplementary Data 2), which is within the range considered to be the same species46. These include new members of clades 1B (1 acBin), 1 F (1 acBin), 1H (9 acBins), and 2 (9 acBins), expanding the known genomic diversity of clade 1H which previously contained a single strain and clade 2 which previously contained five strains (Fig. 2a). Four additional acBins clustered generally within clade 1 but outside of established sublineages (Fig. 2a).

In addition, we used Parsnp47 to construct a more stringent, core SNP-based phylogeny from a reduced set of 11 acBins that aligned to the reference C. tetani genome and passed several criteria (see “Methods”) (Fig. 2c and Supplementary Fig. 9). Only acBins from established C. tetani lineages 1 and 2 passed these criteria, and their phylogenetic positioning is consistent with their clustering pattern (Fig. 2a). The reads associated with the core SNP alignment also showed reduced per-base heterogeneity when mapped to contigs (Supplementary Fig. 4b). Notably, acBins from the Sanganji, Tenerife, Chinchorro, and Chincha samples do not show evidence of branch shortening in the tree indicative of ancient genomes, and instead cluster with modern strains. These acBins tend to have higher rates of strain variation, which could affect branch lengths, or low damage rates potentially indicative of a more recent origin (Supplementary Data 2).

We also assembled a novel strain of C. tetani from a human gut sample (SRR10479805) which phylogenetically clustered with strain NCTC539 (98.7% average nucleotide identity; Supplementary Data 11) from lineage 1 G. The other two identified human gut samples were removed from further analysis as they predominantly matched C. cochlearium based on BLAST analysis.

acBins from clade “X” and branch “Y”

Nine acBins clustered outside of the C. tetani species clade. Eight of these cluster together as part of a divergent clade (labeled “X”) (Fig. 2a). These samples span a large timeframe from ~2290 BCE to 1787 CE, are predominantly (7 of 8) of European origin (Fig. 2b and Supplementary Fig. 10), and come from variable burial contexts including single cave burials, cemeteries, mass graves and burial pits37,48,49,50,51,52,53 (Supplementary Data 2). Two of the samples from sites in Latvia and France are from plague (Y. pestis) victims37,53, and another is from an individual with tuberculosis38. The highest quality clade X acBin is from sample “Augsburg-Tooth” (~2253 BCE), with 53.9% estimated completeness and 4.11% contamination (Supplementary Data 8). Comparison of clade X acBins to other Clostridium species revealed that they are closer to C. tetani and C. cochlearium than any other Clostridium species available in the existing NCBI database, but are divergent enough to be considered a distinct species. On average, based on fastANI54 analysis of orthologous sequences54 Clade X genomes have 86.5 + − 1.7% ANI to C. tetani strain E88, and 85.1 + − 1.3% ANI to C. cochlearium (Supplementary Fig. 11a and Supplementary Data 12). Based on ANI analysis of the whole genome alignment, clade X genomes have 90.8 + − 0.22% ANI to strain E88 (Supplementary Data 2). These similarities were confirmed by analysis of BLAST alignment identities between clade X contigs and reference genomes (Supplementary Fig. 11b). As in the genome-wide tree, individual marker genes (rpsL, rpsG, and recA) from clade X acBins also clustered as divergent branches distinct from C. tetani and C. cochlearium (Supplementary Figs. 12–14). Finally, we re-examined the damage patterns according to phylogenetic clade, and found that clade X genomes possess the highest mean damage; 6/8 clade X genomes have a damage level exceeding 5% and 3/8 exceed 10% (Supplementary Fig. 7d and Supplementary Data 2). These analyses suggest that clade X may represent a previously unidentified lineage of Clostridium, including members of ancient origin. We designated this group Clostridium sp. X.

One sample (“GranCanaria-008-Tooth” from the Canary Islands dated to ~935 CE) also formed a single divergent branch (labeled “Y”) clustering outside all other C. tetani genomes (Fig. 2a). Based on CheckM analysis, this acBin is of moderate quality with 74% completeness, and 0.47% contamination (Supplementary Data 8). A comparison of the GranCanaria-008-Tooth acBin to the NCBI genome database revealed that it is closely related to C. tetani and more distant to other available Clostridium genomes (Supplementary Data 13). Based on fastANI54, it exhibits an ANI of 87.3% to C. tetani E88, and 85.1% to C. cochlearium, below the 95% threshold typically used for species assignment (Supplementary Data 13). Based on ANI analysis of the whole genome alignment, it has a 91.2% ANI to strain E88 (Supplementary Data 2). To further investigate the phylogenetic position of this species, we built gene-based phylogenies with ribosomal marker genes rpsL, rpsG and recA (see Supplementary Figs. 12–14). Each of these three genes support the GranCanaria-008-Tooth lineage as a divergent species distinct from C. tetani. The damage level for this acBin is relatively low (~4.0%), whereas its human mtDNA damage level is ~11.6% (Supplementary Fig. 5). We designated this acBin Clostridium sp. Y.

Genomic similarities and differences in C. tetani-related strains from ancient samples

We next carried out a comprehensive comparison of genome content and structure between the acBins and modern C. tetani strains. We first clustered protein-coding sequences from all modern genomes and acBins into a set of 3729 orthologous groups, and compared their presence/absence across all strains (see “Methods” and Supplementary Data 14). Based on this analysis, we observed considerable overlap in gene content between the acBins versus the modern reference genomes, with the greatest overlap observed between acBins from C. tetani lineages (1 and 2) and the smallest overlap observed for Clostridium sp. X (Supplementary Fig. 11c). For instance, plasmid genes from the E88 reference genome were on average detected in 61% of the most complete acBins from Fig. 1c (comparable to 69% in modern C. tetani genomes), and only 35% of other acBins (Supplementary Data 15). Twenty orthogroups from the E88 plasmid were found in all of these acBins, including the plasmid-specific genes repA, colT, and tent (Supplementary Data 15). In addition to these genes, sporulation-related genes are also highly conserved across the most complete acBins. Of 80 identified sporulation-related genes present in strain E88, 52 of these were detected in 100% of the most complete acBins, and 69/80 were present at over 90% frequency (Supplementary Data 16). Thus, we conclude that key C. tetani functions, including plasmid replication, collagen degradation, neurotoxin production, and sporulation, are conserved in a subset of acBins (i.e., those in Fig. 1c) for which enough genomic data was available to assemble genomes with moderate-high completeness.

We then examined genome similarities by visualizing the alignment of each genome to the reference E88 chromosome and plasmid (Fig. 3a). Several low-coverage acBins can be seen in C. tetani lineages 1 and 2 (Fig. 3a), which is expected given their low completeness estimates (Fig. 2a). However, the divergent GranCanaria-008-Tooth genome (branch “Y”) and Clostridium sp. X consistently have a low alignment coverage, similar to that of C. cochlearium (Fig. 3a), which we suspected may be due in part to these species being more distantly related to C. tetani. Consistent with the idea that clade X represents a distinct species from C. tetani, we identified fourteen genes present in four or more clade X members and absent from all other C. tetani genomes. The genomic context of four of these genes (labeled by orthogroup) is shown in Supplementary Fig. 15. Although these genes are unique to clade X, their surrounding genes are conserved in other C. tetani genomes, implying that genome rearrangements may have resulted in these genes being either gained in Clostridium sp. X or lost in C. tetani.

Fig. 3: Comparative genomics of acBins versus modern C. tetani strains.
figure 3

a Visualization of the chromosomal and plasmid multiple sequence alignment. Orthologous blocks are shown in black and the missing sequence is colored white. The reference gene locations are plotted above the alignments. b Gene neighborhoods surrounding the repA gene (left) and tent gene (right) in modern strains versus acBins. Selected unique differences identified in acBin gene neighborhoods are highlighted. The boxed region shows the assembled tent locus in two clade X acBins. Comparison reveals a putative deletion event in the clade X strains that has removed the majority of the tent gene along with five upstream genes, leaving behind conserved flanking regions. See Supplementary Fig. 18 for more information. c Per-clade coverage of the tent gene normalized to the coverage of repA. The data include n = 33 biologically independent samples, including acBins from clade 1 (n = 3), 1B (n = 1), 1 F (n = 1), 1H (n = 8), 2 (n = 9), X (n = 7), Y (n = 1), and acBins whose clade affiliation could not be determined (N.D., n = 3). The coverage was calculated as the average depth of coverage based on mapped reads to each gene. The boxplots depict the lower quartile, median, and upper quartile of the data, with whiskers extending to 1.5 times the interquartile range (IQR) above the third quartile or below the first quartile. See Supplementary Fig. 17 for the associated read pileups. Source data for (ac) are provided as a Source Data file.

To examine differences in plasmid gene content and structure directly, we then compared the gene neighborhoods surrounding the plasmid-marker genes repA and colT (Fig. 3b, expanded data shown in Supplementary Fig. 16). In several acBins from C. tetani lineages 1 or 2, the gene neighborhoods surrounding these genes are similar to that in modern strains (Supplementary Fig. 16). However, particularly in Clostridium sp. X and Y, we identified unique gene clusters distinct from those in modern strains. For example, in two Clostridium sp. X genomes and the Clostridium sp. Y genome, we identified a conserved toxin/antitoxin pair and a phage integrase flanking the repA gene (Fig. 3b). In Clostridium sp. Y, these genes were found on an assembled 53.6 kb contig (“SAMEA104281224_k141_98912”), which was indeed predicted as a plasmid by the RFplasmid program with a 70.4% vote using the Clostridium model55. We also observed a unique gene arrangement surrounding colT that is conserved in two clade X genomes (Supplementary Fig. 16). Additional differences were identified in a few lineage 2 acBins; for example, Tenerife-004-Tooth contains unique genes neighboring repA, and the Tenerife-013-Tooth acBin uniquely encodes the repA gene adjacent to its tent and tetR gene (Fig. 3b).

We then performed a detailed comparison of the plasmid-encoded neurotoxin gene, tent, and its gene neighborhood (where possible) across the strains. As shown in Fig. 3a as well as based on mapped read coverage to these regions (Fig. 3c, Supplementary Fig. 17, and Supplementary Data 17), the tent gene was detected at a relatively high depth of coverage in acBins from C. tetani lineages 1 and 2. The tent gene neighborhood structure from lineage 1 or 2 acBin strains is also similar or identical to that in modern strains, with the exception of Tenerife-013-Tooth (as it encodes the repA gene nearby) (Fig. 3b).

However, in the acBins from lineage X and Y, the tent gene was either missing or was fragmented, suggesting a possible gene loss or pseudogenization event (Fig. 3c). This pattern can be seen clearly in read coverage plots (Supplementary Fig. 17) and when normalizing tent depth of coverage to that of the plasmid-marker gene, repA (Fig. 3c). The tent locus in the two Clostridium sp. X genomes for which assembly data is available over this region appears to have undergone a deletion event resulting in the deletion of over 90% of the tent sequence as well as 3 neighboring genes (Fig. 3b and Supplementary Fig. 18). This analysis further supports the idea that the tent fragment may be a nonfunctional pseudogene in these clade X strains.

Ultimately, our comparative genomic analysis of gene content and neighborhood structure demonstrates that the plasmids in several of the ancient samples (particularly those of Clostridium sp. X) are distinct from modern C. tetani plasmids, while the plasmids of acBins from lineages 1 and 2 are similar to those of existing C. tetani strains. This reinforces our earlier phylogenetic analysis indicating that clade X and branch Y represent new Clostridium species that are closely related to but distinct from C. tetani.

Identification and experimental testing of novel TeNT variants

Given the considerable scientific and biomedical importance of clostridial neurotoxins, we next focused on tent and reconstructed a total of 18 tent gene sequences (all from lineage 1 and 2 acBins) from aDNA using a sensitive variant calling pipeline (see “Methods”). Six tent sequences have complete coverage, and 12 have 75-99.9% coverage (Supplementary Data 18). Six partial tent sequences were also reconstructed but had lower average depth of coverage as shown in the read pileups (Supplementary Fig. 17). Four of the reconstructed tent sequences are identical to modern tent sequences, while 14 (including two identical sequences) are novel tent variants with 99.1–99.9% nucleotide identity to modern tent, comparable to the variation seen among modern tent genes (98.6–100%). We then built a phylogeny including the 18 tent genes from aDNA and all 12 modern tent sequences (Fig. 4a). The tent genes clustered into three subgroups with modern and aDNA-associated tent genes found in subgroups 1 and 2, and aDNA-associated tent genes forming a novel subgroup 3 (Fig. 4a). All three of the tent sequences in the novel tent subgroup 3 are from clade 1H aDNA strains.

Fig. 4: Analysis and experimental testing of a novel TeNT lineage identified from ancient DNA.
figure 4

a Maximum-likelihood phylogenetic tree of tent genes including novel tent sequences assembled from ancient DNA samples and a non-redundant set of tent sequences from existing strains in which duplicates have been removed (see “Methods” for details). The phylogeny has been subdivided into three subgroups. Sequences are labeled according to sample followed by their associated clade in the genome-based tree (Fig. 2a), except for the Barcelona-3031-Tooth sequence (*) as it fell below the coverage threshold. b Visualization of tent sequence variation, with vertical bars representing nucleotide substitutions found uniquely in tent sequences from ancient DNA samples. On the right, a barplot is shown that indicates the number of unique substitutions found in each sequence, highlighting the uniqueness of subgroup 3. c Structural model of TeNT/Chinchorro indicating all of its unique amino acid substitutions, which are not observed in modern TeNT sequences. Also shown is a segment of the translated alignment for a specific N-terminal region of the TeNT protein (residues 141–149, Uniprot ID P04958). This sub-alignment illustrates a segment containing a high density of unique amino acid substitutions, four of which are shared in TeNT/El-Yaral and TeNT/Chinchorro. d MapDamage analysis of the tent/Chinchorro gene, and associated C. tetani contigs and mtDNA from the Chinchorro-Mummy-Bone sample. e Cultured rat cortical neurons were exposed to full-length toxins in culture medium at the indicated concentration for 12 h. Cell lysates were analyzed by immunoblot, and the image shown is a representative of four independent experiments. WT TeNT (uniprot accession # P04958) and TeNT/Chinchorro (“ch”) showed similar levels of activity in cleaving VAMP2 in neurons. f, g Full-length toxins ligated by sortase reaction were injected into the gastrocnemius muscles of the right hind limb of mice. The extent of muscle rigidity was monitored and scored for 4 days (means ± s.e.; n = 3 per group, 9 total). TeNT/Chinchorro (“ch”) induced typical spastic paralysis and showed a potency similar to WT TeNT. Source data for (a, b, d, e, g) are provided as a Source Data file.

We then visualized the uniqueness of aDNA-associated tent genes by mapping nucleotide substitutions onto the phylogeny (Fig. 4b and Supplementary Fig. 19), and focusing on “unique” tent substitutions found only in ancient samples and not in modern tent sequences. We identified a total of 46 such substitutions that are completely unique to one or more aDNA-associated tent genes (Fig. 4b, Supplementary Fig. 20, and Supplementary Data 19), which were statistically supported by the stringent variant calling pipeline (Supplementary Data 20). The largest number of unique substitutions occurred in tent/Chinchorro from tent subgroup 3, which is the oldest sample in our dataset (“Chinchorro mummy bone”, ~3889 BCE). tent/Chinchorro possesses 18 unique substitutions not found in modern tent, and 12 of these are shared with tent/El-Yaral and 10 with tent/Chiribaya (Fig. 4b). The three associated acBins also cluster as neighbors in the phylogenomic tree (Fig. 2a), and the three associated archeological samples originate from a similar geographic region in Peru and Chile (Supplementary Fig. 21). These shared patterns suggest a common evolutionary origin for these C. tetani strains and their unique neurotoxin genes and highlight tent subgroup 3 as a distinct group of tent variants exclusive to ancient samples (Fig. 4a).

We then focused on tent/Chinchorro as a representative sequence of this group as its full-length gene sequence could be completely assembled. The 18 unique substitutions present in the tent/Chinchorro gene result in 12 unique amino acid substitutions, absent from modern TeNT protein sequences (L140S, E141K, P144T, S145N, A147T, T148P, T149I, P445T, P531Q, V653I, V806I, H924R) (Supplementary Data 21). Seven of these substitutions are spatially clustered within a surface loop on the TeNT structure56 and represent a potential mutation “hot spot” (Fig. 4c). Interestingly, 7/12 amino acid substitutions found in TeNT/Chinchorro are also shared with TeNT/El-Yaral and 5/12 are shared with TeNT/Chiribaya (Supplementary Data 21). As highlighted in Fig. 4c, TeNT/Chinchorro and TeNT/El-Yaral share a divergent 9-aa segment (amino acids 141–149 in TeNT, P04958) that is distinct from all other TeNT sequences. Reads mapping to the tent/Chinchorro gene show a low damage level similar to that seen in the C. tetani contigs from this sample, and their damage pattern is weaker than the corresponding damage pattern from the associated human mitochondrial DNA (Fig. 4d).

Given the phylogenetic novelty and unique pattern of substitutions observed for the tent/Chinchorro gene, we sought to determine whether it encodes an active tetanus neurotoxin. For biosafety reasons, we avoided the production of a tent/Chinchorro gene construct and instead used sortase-mediated ligation to produce limited quantities of full-length protein toxin (Supplementary Fig. 22), as done previously for other neurotoxins57,58. This involved producing two recombinant proteins in E. coli, one constituting the N-terminal fragment and another containing the C-terminal fragment of TeNT/Chinchorro, and then ligating these together using sortase. The resulting full-length TeNT/Chinchorro protein cleaved the canonical TeNT substrate, VAMP2, in cultured rat cortical neurons (Fig. 4e), and can be neutralized with anti-TeNT anti-sera (Supplementary Fig. 22). TeNT/Chinchorro induced spastic paralysis in vivo in mice when injected to the hind leg muscle, which displayed a classic tetanus-like phenotype identical to that seen for wild-type TeNT (Fig. 4f). Quantification of muscle rigidity following TeNT and TeNT/Chinchorro exposure demonstrated that TeNT/Chinchorro exhibits a potency that is indistinguishable from TeNT (Fig. 4g). Together, these data demonstrate that the reconstructed tent/Chinchorro gene encodes an active and highly potent TeNT variant.

You may also like...

Leave a Reply

Your email address will not be published. Required fields are marked *

%d bloggers like this: