Downloads
Accessing the Downloads
All metadata and results from calculations done by Pathogenwatch are available to download in various formats. Most downloads can either be accessed by selecting genomes in the Genome Browser view and clicking on the "Selected Genomes" button in the top right corner and then "Downloads" or via the "Downloads" button in the Collection View. Additionally, assembly FASTA files and GFF files can be obtained by clicking the icons in the Metadata Tables in the leftmost column.
Downloads can take some time to generate for large collections.
Available Downloads
The downloaded FASTA files are compressed using zip. Generally you'll need to unzip the bundle before using the files in other software.
FASTA Files
The original uploaded FASTA files for each assembly. The FASTA downloads can be accessed from the Selected Genomes popup and from the Metadata tables in the Collection.
GFF Files
A GFF format output of the core, MLST, and AMR results is available for each assembly from the Metadata Tables in the Collection View.
Core Matches
The score (
{s}
) is the percent identity to the reference allele.The
Name
attribute is the Pathogenwatch gene family name.Target
is the reference allele identifier and the start-end of the match.
MLST Matches
The score (
{s}
) is the percent identity to the reference allele.The
Name
attribute is the Pathogenwatch gene family name.Target
is the reference allele identifier and the start-end of the match.
AMR Matches
"_PAAR" are presence/absence matches.
"_SNPAR" are for gene variants.
Notes on "_SNPAR"
Consists of two components - (1) the match to the SNPAR genes (type=CDS) (2) the individual resistance mutations (type=point_mutation)
The CDS record describes the reference gene and match statistics. There can be 1 or more mutation records that cause an amino acid change leading to resistance. Each mutation should have a reference to a parent CDS feature.
Apart from the FASTA and GFF files, the rest of the downloads are plain text CSV for easy use with other software
AMR
AMR Profile
The overall AMR profile - i.e. the antibiotics for which potential resistance genotypes have been identified for each assembly - is available from the Selected Genomes popup and the Collection View Downloads. The header consists of genome name, id, analysis module version and each antibiotic for that species. Possible values are "NOT_FOUND", "INTERMEDIATE" and "RESISTANT"
Genome ID | Genome Name | Version | Amikacin | Penicillin | Tobramycin |
5af2b16630c1ce57566f4809 | 73_ES_2858 | v2 | NOT_FOUND | RESISTANT | NOT_FOUND |
5af2b16630c1ce66cd6f480b | 79_PT_30 | v2 | NOT_FOUND | RESISTANT | NOT_FOUND |
5af2b16930c1cead386f480d | 77_SE_2879 | v2 | NOT_FOUND | RESISTANT | NOT_FOUND |
Downloads in the Collection View will reflect the current selection.
AMR SNPs & Genes
In the Collection View Downloads the presence/absence ("1"/"0") profile of AMR-associated sequence variants and AMR-associated genes can be downloaded. From both download menus, a CSV file of the resistance profile can be downloaded for selected assemblies. Example output for some Staphylococcus aureus assemblies can be seen in the table below.
Genome Name | Version | Amikacin | Gentamicin | Tobramycin | Kanamycin | Methicillin | Penicillin | Fusidic Acid | |
5b30dea5d9d9a8a7c12aed29 | Sheep_B2.fasta | v2 | NOT_FOUND | NOT_FOUND | NOT_FOUND | NOT_FOUND | RESISTANT | RESISTANT | NOT_FOUND |
5b30dea6d9d9a8879a2aed2b | Sheep_B3.fasta | v2 | NOT_FOUND | NOT_FOUND | NOT_FOUND | NOT_FOUND | RESISTANT | RESISTANT | NOT_FOUND |
5b30dea8d9d9a8380d2aed2d | Patient_A1.fasta | v2 | NOT_FOUND | NOT_FOUND | NOT_FOUND | NOT_FOUND | RESISTANT | RESISTANT | NOT_FOUND |
5b30deacd9d9a8540c2aed2f | Cow_A.fasta | v2 | NOT_FOUND | NOT_FOUND | NOT_FOUND | NOT_FOUND | RESISTANT | RESISTANT | NOT_FOUND |
cgMLST
The cgMLST assignments for each assembly are available in CSV format from both download menus. Example output is shown below.
Genome Name | Version | Gene | Allele ID | Start | End | Contig | Direction | |
5b30dea8d9d9a8380d2aed2d | Patient_A1.fasta | 20180516172348-v1.6.1 | SAUR0001 | 2 | 145292 | 143931 | ERS049983.7092_7_81.4 | reverse |
5b30dea8d9d9a8380d2aed2d | Patient_A1.fasta | 20180516172348-v1.6.1 | SAUR0002 | 33 | 143651 | 142518 | ERS049983.7092_7_81.4 | reverse |
5b30dea8d9d9a8380d2aed2d | Patient_A1.fasta | 20180516172348-v1.6.1 | SAUR0003 | 2 | 142128 | 141892 | ERS049983.7092_7_81.4 | reverse |
5b30dea8d9d9a8380d2aed2d | Patient_A1.fasta | 20180516172348-v1.6.1 | SAUR0004 | 2 | 141895 | 140783 | ERS049983.7092_7_81.4 | reverse |
MLST
MLST assignments can be downloaded from both of the download menus. The source of the schema (e.g. PubMLST) is shown and linked to in the download menu. Example output for Staphylococcus aureus can be seen below:
Genome Name | Version | ST | arcC | aroE | glpF | gmk | pta | tpi | yqiL | |
5b30dea5d9d9a8a7c12aed29 | Sheep_B2.fasta | 20180516172348-v1.6.1 | 130 | 6 | 57 | 45 | 2 | 7 | 58 | 52 |
5b30dea6d9d9a8879a2aed2b | Sheep_B3.fasta | 20180516172348-v1.6.1 | 130 | 6 | 57 | 45 | 2 | 7 | 58 | 52 |
5b30dea8d9d9a8380d2aed2d | Patient_A1.fasta | 20180516172348-v1.6.1 | 130 | 6 | 57 | 45 | 2 | 7 | 58 | 52 |
Genotyphi
Genotyphi is only run for Salmonella Typhi. Result CSVs are available from both the download menus. Example results are shown below.
Genome ID | Genome Name | Version | Genotype | SNPs Called |
5a69cfc256aeb700010dcffc | 007898.fasta | v2 | 4.3.1 | 68 |
5a69cfc456aeb700010dcffe | 404Ty.fasta | v2 | 3.1.2 | 68 |
5a69cfc456aeb700010dd000 | 11909_3.fasta | v2 | 2.0.2 | 69 |
5a69cffab0c5b70001796add | Ty2.fasta | v2 | 4.1 | 68 |
Speciation
The detailed Speciator output can be downloaded from both the collection view and the Selected Genomes box. Example output a set of Staphylococcus aureus can be seen below.
Genome ID | Genome Name | Version | Organism Name | Organism ID | Species Name | Species ID | Genus Name | Genus ID | Reference ID | Matching Hashes | p-Value | Mash Distance |
5b30dea5d9d9a8a7c12aed29 | Sheep_B2.fasta | v1 | Staphylococcus aureus | 1280 | Staphylococcus aureus | 1280 | Staphylococcus | 1279 | GCF_001197935.1 | 400/400 | 0 | 0 |
5b30dea6d9d9a8879a2aed2b | Sheep_B3.fasta | v1 | Staphylococcus aureus | 1280 | Staphylococcus aureus | 1280 | Staphylococcus | 1279 | GCF_001208645.1 | 399/400 | 0 | 7.82718E-05 |
5b30dea8d9d9a8380d2aed2d | Patient_A1.fasta | v1 | Staphylococcus aureus | 1280 | Staphylococcus aureus | 1280 | Staphylococcus | 1279 | GCF_001203975.1 | 400/400 | 0 | 0 |
5b30deacd9d9a8540c2aed2f | Cow_A.fasta | v1 | Staphylococcus aureus | 1280 | Staphylococcus aureus | 1280 | Staphylococcus | 1279 | GCF_001193975.1 | 399/400 | 0 | 7.82718E-05 |
5b30deaed9d9a846cf2aed35 | Patient_A2.fasta | v1 | Staphylococcus aureus | 1280 | Staphylococcus aureus | 1280 | Staphylococcus | 1279 | GCF_000982735.1 | 400/400 | 0 | 0 |
5b30deaed9d9a8734e2aed31 | Patient_B.fasta | v1 | Staphylococcus aureus | 1280 | Staphylococcus aureus | 1280 | Staphylococcus | 1279 | GCF_001197935.1 | 400/400 | 0 | 0 |
5b30deaed9d9a8bca52aed33 | Sheep_B1.fasta | v1 | Staphylococcus aureus | 1280 | Staphylococcus aureus | 1280 | Staphylococcus | 1279 | GCF_001209085.1 | 400/400 | 0 | 0 |
Stats
The assembly quality statistics calculated for each genome are can be downloaded from Collection View downloads and the Selected Genomes popup. An example of the fields and output is given below.
Genome Name | Version | Genome Length | No. Contigs | Smallest Contig | Largest Contig | Average Contig Length | N50 | non-ATCG | GC Content |
73_ES_2858 | v2 | 2774944 | 53 | 605 | 239743 | 52357 | 117181 | 152 | 32.7 |
79_PT_30 | v2 | 2846170 | 46 | 398 | 336545 | 61873 | 110036 | 495 | 32.7 |
77_SE_2879 | v2 | 2787435 | 50 | 326 | 481787 | 55748 | 96906 | 150 | 32.7 |
Core Allele Distribution
The presence/absence table of Pathogenwatch scheme core matches is available from the Collection View downloads. Family names are given in the column header with one row per assembly. The percentage overlap with the reference is given as a value between 0-1 ("0" is absent, "1" is complete and present. Multiple hits to the same family are separated with semi-colons.
Score Matrix
The final score matrix used to build the neighbour-joining collection trees are available from the Collection View downloads. The score is the number of differences normalised by the percentage of the core matched, and so is typically almost identical to the number of differences.
Difference Matrix
The number of counted differences between each pair of assemblies in the current selection can be downloaded from the Collection View. For more details on how this is calculated see the Core Genome Tree section.
Variance Summary
A summary of variance seen in the selected genomes, also sub-grouped by nearest reference, can be downloaded. An example of the output is given below.
Last updated