Downloads
All metadata and results from calculations done by Pathogenwatch are available to download in various formats. Most downloads can either be accessed by selecting genomes in the Genome Browser view and clicking on the "Selected Genomes" button in the top right corner and then "Downloads" or via the "Downloads" button in the Collection View. Additionally, assembly FASTA files and GFF files can be obtained by clicking the icons in the Metadata Tables in the leftmost column.
Downloads can take some time to generate for large collections.


The downloaded FASTA files are compressed using zip. Generally you'll need to unzip the bundle before using the files in other software.
The original uploaded FASTA files for each assembly. The FASTA downloads can be accessed from the Selected Genomes popup and from the Metadata tables in the Collection.
A GFF format output of the core, MLST, and AMR results is available for each assembly from the Metadata Tables in the Collection View.
{seqid} {source} {type} {start} {end} {s} {strand} {phase} {attributes}
##gff-version 3.2.1
ContigA Pathogenwatch_Core CDS_region 290 1000 90.2 + 0 ID=ContigA_290-1000;Name=fskA;Target=FskA 1 710;TargetLength=710;Note=Complete Match;Evalue=1e-100
ContigB Pathogenwatch_Core CDS_region 32900 35301 86.03 - 1 ID=ContigB_32900-35301;Name=grpA;Target=GrpA 2 2401;TargetLength=2499;Note=Partial Match;Evalue=1e-140
- The score (
{s}
) is the percent identity to the reference allele. - The
Name
attribute is the Pathogenwatch gene family name. Target
is the reference allele identifier and the start-end of the match.
{seqid} {source} {type} {start} {end} {s} {strand} {phase} {attributes}
ContigA Pathogenwatch_MLST genetic_marker 500 799 98.5 + 1 ID=MLST_ContigA_500-799;Name=arcc;Target=arcc_1001 1 300;Note=Allele 1001;Evalue=1e-120
- The score (
{s}
) is the percent identity to the reference allele. - The
Name
attribute is the Pathogenwatch gene family name. Target
is the reference allele identifier and the start-end of the match.
{seqid} {source} {type} {start} {end} {s} {strand} {phase} {attributes}
ContigA Pathogenwatch_PAAR CDS 300 599 87.5 + 1 ID=PAAR_ContigA_300-599;Name=mecA;Target=mecA 1 399;Note=Methicillin;Evalue=1e-100;TargetLength=399
ContigB Pathogenwatch_SNPAR CDS 200 499 99.5 + 1 ID=SNPAR_23S_RNA_1;Name=23S_RNA_1;Target=23S_RNA 1 399;Evalue=1e-120;TargetLength=399
ContigB Pathogenwatch_SNPAR point_mutation 251 251 . + . ID=SNPAR_23S_RNA_1_T52A;Name=23S_RNA_T52A;Parent=SNPAR_23S_RNA_1;Note=Erythromycin,Ciprofloxacin
ContigB Pathogenwatch_SNPAR CDS 1000 3000 99.0 + 1 ID=SNPAR_cpA_1;Name=cipA_1;Target=cipA1 1 2001;Evalue=1e-140;TargetLength=2001
ContigB Pathogenwatch_SNPAR point_mutation 1784 1784 . + . ID=SNPAR_cpA_1_A262P;Name=cpA_A262P;Parent=SNPAR_cpA_1;Note=Cipidydoxin
- "_PAAR" are presence/absence matches.
- "_SNPAR" are for gene variants.
Notes on "_SNPAR"
- Consists of two components - (1) the match to the SNPAR genes (type=CDS) (2) the individual resistance mutations (type=point_mutation)
- The CDS record describes the reference gene and match statistics. There can be 1 or more mutation records that cause an amino acid change leading to resistance. Each mutation should have a reference to a parent CDS feature.
Apart from the FASTA and GFF files, the rest of the downloads are plain text CSV for easy use with other software
The overall AMR profile - i.e. the antibiotics for which potential resistance genotypes have been identified for each assembly - is available from the Selected Genomes popup and the Collection View Downloads. The header consists of genome name, id, analysis module version and each antibiotic for that species. Possible values are "NOT_FOUND", "INTERMEDIATE" and "RESISTANT"
Genome ID | Genome Name | Version | Amikacin | Penicillin | Tobramycin |
5af2b16630c1ce57566f4809 | 73_ES_2858 | v2 | NOT_FOUND | RESISTANT | NOT_FOUND |
5af2b16630c1ce66cd6f480b | 79_PT_30 | v2 | NOT_FOUND | RESISTANT | NOT_FOUND |
5af2b16930c1cead386f480d | 77_SE_2879 | v2 | NOT_FOUND | RESISTANT | NOT_FOUND |
Downloads in the Collection View will reflect the current selection.
In the Collection View Downloads the presence/absence ("1"/"0") profile of AMR-associated sequence variants and AMR-associated genes can be downloaded. From both download menus, a CSV file of the resistance profile can be downloaded for selected assemblies. Example output for some Staphylococcus aureus assemblies can be seen in the table below.
Genome Name | Version | Amikacin | Gentamicin | Tobramycin | Kanamycin | Methicillin | Penicillin | Fusidic Acid | |
5b30dea5d9d9a8a7c12aed29 | Sheep_B2.fasta | v2 | NOT_FOUND | NOT_FOUND | NOT_FOUND | NOT_FOUND | RESISTANT | RESISTANT | NOT_FOUND |
5b30dea6d9d9a8879a2aed2b | Sheep_B3.fasta | v2 | NOT_FOUND | NOT_FOUND | NOT_FOUND | NOT_FOUND | RESISTANT | RESISTANT | NOT_FOUND |
5b30dea8d9d9a8380d2aed2d | Patient_A1.fasta | v2 | NOT_FOUND | NOT_FOUND | NOT_FOUND | NOT_FOUND | RESISTANT | RESISTANT | NOT_FOUND |
5b30deacd9d9a8540c2aed2f | Cow_A.fasta | v2 | NOT_FOUND | NOT_FOUND | NOT_FOUND | NOT_FOUND | RESISTANT | RESISTANT | NOT_FOUND |
The cgMLST assignments for each assembly are available in CSV format from both download menus. Example output is shown below.
Genome Name | Version | Gene | Allele ID | Start | End | Contig | Direction | |
5b30dea8d9d9a8380d2aed2d | Patient_A1.fasta | 20180516172348-v1.6.1 | SAUR0001 | 2 | 145292 | 143931 | ERS049983.7092_7_81.4 | reverse |
5b30dea8d9d9a8380d2aed2d | Patient_A1.fasta | 20180516172348-v1.6.1 | SAUR0002 | 33 | 143651 | 142518 | ERS049983.7092_7_81.4 | reverse |
5b30dea8d9d9a8380d2aed2d | Patient_A1.fasta | 20180516172348-v1.6.1 | SAUR0003 | 2 | 142128 | 141892 | ERS049983.7092_7_81.4 | reverse |
5b30dea8d9d9a8380d2aed2d | Patient_A1.fasta | 20180516172348-v1.6.1 | SAUR0004 | 2 | 141895 | 140783 | ERS049983.7092_7_81.4 | reverse |
MLST assignments can be downloaded from both of the download menus. The source of the schema (e.g. PubMLST) is shown and linked to in the download menu. Example output for Staphylococcus aureus can be seen below:
Genome Name | Version | ST | arcC | aroE | glpF | gmk | pta | tpi | yqiL | |
5b30dea5d9d9a8a7c12aed29 | Sheep_B2.fasta | 20180516172348-v1.6.1 | 130 | 6 | 57 | 45 | 2 | 7 | 58 | 52 |
5b30dea6d9d9a8879a2aed2b | Sheep_B3.fasta | 20180516172348-v1.6.1 | 130 | 6 | 57 | 45 | 2 | 7 | 58 | 52 |
5b30dea8d9d9a8380d2aed2d | Patient_A1.fasta | 20180516172348-v1.6.1 | 130 | 6 | 57 | 45 | 2 | 7 | 58 | 52 |
Genotyphi is only run for Salmonella Typhi. Result CSVs are available from both the download menus. Example results are shown below.
Genome ID | Genome Name | Version | Genotype | SNPs Called |
5a69cfc256aeb700010dcffc | 007898.fasta | v2 | 4.3.1 | 68 |
5a69cfc456aeb700010dcffe | 404Ty.fasta | v2 | 3.1.2 | 68 |
5a69cfc456aeb700010dd000 | 11909_3.fasta | v2 | 2.0.2 | 69 |
5a69cffab0c5b70001796add | Ty2.fasta | v2 | 4.1 | 68 |
The detailed Speciator output can be downloaded from both the collection view and the Selected Genomes box. Example output a set of Staphylococcus aureus can be seen below.
Genome ID | Genome Name | Version | Organism Name | Organism ID | Species Name | Species ID | Genus Name | Genus ID | Reference ID | Matching Hashes | p-Value | Mash Distance |
5b30dea5d9d9a8a7c12aed29 | Sheep_B2.fasta | v1 | Staphylococcus aureus | 1280 | Staphylococcus aureus | 1280 | Staphylococcus | 1279 | GCF_001197935.1 | 400/400 | 0 | 0 |
5b30dea6d9d9a8879a2aed2b | Sheep_B3.fasta | v1 | Staphylococcus aureus | 1280 | Staphylococcus aureus | 1280 | Staphylococcus | 1279 | GCF_001208645.1 | 399/400 | 0 | 7.82718E-05 |
5b30dea8d9d9a8380d2aed2d | Patient_A1.fasta | v1 | Staphylococcus aureus | 1280 | Staphylococcus aureus | 1280 | Staphylococcus | 1279 | GCF_001203975.1 | 400/400 | 0 | 0 |
5b30deacd9d9a8540c2aed2f | Cow_A.fasta | v1 | Staphylococcus aureus | 1280 | Staphylococcus aureus | 1280 | Staphylococcus | 1279 | GCF_001193975.1 | 399/400 | 0 | 7.82718E-05 |
5b30deaed9d9a846cf2aed35 | Patient_A2.fasta | v1 | Staphylococcus aureus | 1280 | Staphylococcus aureus | 1280 | Staphylococcus | 1279 | GCF_000982735.1 | 400/400 | 0 | 0 |
5b30deaed9d9a8734e2aed31 | Patient_B.fasta | v1 | Staphylococcus aureus | 1280 | Staphylococcus aureus | 1280 | Staphylococcus | 1279 | GCF_001197935.1 | 400/400 | 0 | 0 |
5b30deaed9d9a8bca52aed33 | Sheep_B1.fasta | v1 | Staphylococcus aureus | 1280 | Staphylococcus aureus | 1280 | Staphylococcus | 1279 | GCF_001209085.1 | 400/400 | 0 | 0 |
| |
| |
The assembly quality statistics calculated for each genome are can be downloaded from Collection View downloads and the Selected Genomes popup. An example of the fields and output is given below.
Genome Name | Version | Genome Length | No. Contigs | Smallest Contig | Largest Contig | Average Contig Length | N50 | non-ATCG | GC Content |
73_ES_2858 | v2 | 2774944 | 53 | 605 | 239743 | 52357 | 117181 | 152 | 32.7 |
79_PT_30 | v2 | 2846170 | 46 | 398 | 336545 | 61873 | 110036 | 495 | 32.7 |
77_SE_2879 | v2 | 2787435 | 50 | 326 | 481787 | 55748 | 96906 | 150 | 32.7 |
The presence/absence table of Pathogenwatch scheme core matches is available from the Collection View downloads. Family names are given in the column header with one row per assembly. The percentage overlap with the reference is given as a value between 0-1 ("0" is absent, "1" is complete and present. Multiple hits to the same family are separated with semi-colons.
The final score matrix used to build the neighbour-joining collection trees are available from the Collection View downloads. The score is the number of differences normalised by the percentage of the core matched, and so is typically almost identical to the number of differences.
The number of counted differences between each pair of assemblies in the current selection can be downloaded from the Collection View. For more details on how this is calculated see the Core Genome Tree section.
A summary of variance seen in the selected genomes, also sub-grouped by nearest reference, can be downloaded. An example of the output is given below.
Last modified 2yr ago