Downloads
Last updated
Last updated
All metadata and results from calculations done by Pathogenwatch are available to download in various formats. Most downloads can either be accessed by selecting genomes in the Genome Browser view and clicking on the "Selected Genomes" button in the top right corner and then "Downloads" or via the "Downloads" button in the Collection View. Additionally, assembly FASTA files and GFF files can be obtained by clicking the icons in the Metadata Tables in the leftmost column.
Downloads can take some time to generate for large collections.
The downloaded FASTA files are compressed using zip. Generally you'll need to unzip the bundle before using the files in other software.
The original uploaded FASTA files for each assembly. The FASTA downloads can be accessed from the Selected Genomes popup and from the Metadata tables in the Collection.
A GFF format output of the core, MLST, and AMR results is available for each assembly from the Metadata Tables in the Collection View.
The score ({s}
) is the percent identity to the reference allele.
The Name
attribute is the Pathogenwatch gene family name.
Target
is the reference allele identifier and the start-end of the match.
The score ({s}
) is the percent identity to the reference allele.
The Name
attribute is the Pathogenwatch gene family name.
Target
is the reference allele identifier and the start-end of the match.
"_PAAR" are presence/absence matches.
"_SNPAR" are for gene variants.
Notes on "_SNPAR"
Consists of two components - (1) the match to the SNPAR genes (type=CDS) (2) the individual resistance mutations (type=point_mutation)
The CDS record describes the reference gene and match statistics. There can be 1 or more mutation records that cause an amino acid change leading to resistance. Each mutation should have a reference to a parent CDS feature.
Apart from the FASTA and GFF files, the rest of the downloads are plain text CSV for easy use with other software
The overall AMR profile - i.e. the antibiotics for which potential resistance genotypes have been identified for each assembly - is available from the Selected Genomes popup and the Collection View Downloads. The header consists of genome name, id, analysis module version and each antibiotic for that species. Possible values are "NOT_FOUND", "INTERMEDIATE" and "RESISTANT"
Genome ID
Genome Name
Version
Amikacin
Penicillin
Tobramycin
5af2b16630c1ce57566f4809
73_ES_2858
v2
NOT_FOUND
RESISTANT
NOT_FOUND
5af2b16630c1ce66cd6f480b
79_PT_30
v2
NOT_FOUND
RESISTANT
NOT_FOUND
5af2b16930c1cead386f480d
77_SE_2879
v2
NOT_FOUND
RESISTANT
NOT_FOUND
Downloads in the Collection View will reflect the current selection.
In the Collection View Downloads the presence/absence ("1"/"0") profile of AMR-associated sequence variants and AMR-associated genes can be downloaded. From both download menus, a CSV file of the resistance profile can be downloaded for selected assemblies. Example output for some Staphylococcus aureus assemblies can be seen in the table below.
Genome Name
Version
Amikacin
Gentamicin
Tobramycin
Kanamycin
Methicillin
Penicillin
Fusidic Acid
5b30dea5d9d9a8a7c12aed29
Sheep_B2.fasta
v2
NOT_FOUND
NOT_FOUND
NOT_FOUND
NOT_FOUND
RESISTANT
RESISTANT
NOT_FOUND
5b30dea6d9d9a8879a2aed2b
Sheep_B3.fasta
v2
NOT_FOUND
NOT_FOUND
NOT_FOUND
NOT_FOUND
RESISTANT
RESISTANT
NOT_FOUND
5b30dea8d9d9a8380d2aed2d
Patient_A1.fasta
v2
NOT_FOUND
NOT_FOUND
NOT_FOUND
NOT_FOUND
RESISTANT
RESISTANT
NOT_FOUND
5b30deacd9d9a8540c2aed2f
Cow_A.fasta
v2
NOT_FOUND
NOT_FOUND
NOT_FOUND
NOT_FOUND
RESISTANT
RESISTANT
NOT_FOUND
The cgMLST assignments for each assembly are available in CSV format from both download menus. Example output is shown below.
Genome Name
Version
Gene
Allele ID
Start
End
Contig
Direction
5b30dea8d9d9a8380d2aed2d
Patient_A1.fasta
20180516172348-v1.6.1
SAUR0001
2
145292
143931
ERS049983.7092_7_81.4
reverse
5b30dea8d9d9a8380d2aed2d
Patient_A1.fasta
20180516172348-v1.6.1
SAUR0002
33
143651
142518
ERS049983.7092_7_81.4
reverse
5b30dea8d9d9a8380d2aed2d
Patient_A1.fasta
20180516172348-v1.6.1
SAUR0003
2
142128
141892
ERS049983.7092_7_81.4
reverse
5b30dea8d9d9a8380d2aed2d
Patient_A1.fasta
20180516172348-v1.6.1
SAUR0004
2
141895
140783
ERS049983.7092_7_81.4
reverse
MLST assignments can be downloaded from both of the download menus. The source of the schema (e.g. PubMLST) is shown and linked to in the download menu. Example output for Staphylococcus aureus can be seen below:
Genome Name
Version
ST
arcC
aroE
glpF
gmk
pta
tpi
yqiL
5b30dea5d9d9a8a7c12aed29
Sheep_B2.fasta
20180516172348-v1.6.1
130
6
57
45
2
7
58
52
5b30dea6d9d9a8879a2aed2b
Sheep_B3.fasta
20180516172348-v1.6.1
130
6
57
45
2
7
58
52
5b30dea8d9d9a8380d2aed2d
Patient_A1.fasta
20180516172348-v1.6.1
130
6
57
45
2
7
58
52
Genotyphi is only run for Salmonella Typhi. Result CSVs are available from both the download menus. Example results are shown below.
Genome ID
Genome Name
Version
Genotype
SNPs Called
5a69cfc256aeb700010dcffc
007898.fasta
v2
4.3.1
68
5a69cfc456aeb700010dcffe
404Ty.fasta
v2
3.1.2
68
5a69cfc456aeb700010dd000
11909_3.fasta
v2
2.0.2
69
5a69cffab0c5b70001796add
Ty2.fasta
v2
4.1
68
The detailed Speciator output can be downloaded from both the collection view and the Selected Genomes box. Example output a set of Staphylococcus aureus can be seen below.
Genome ID
Genome Name
Version
Organism Name
Organism ID
Species Name
Species ID
Genus Name
Genus ID
Reference ID
Matching Hashes
p-Value
Mash Distance
5b30dea5d9d9a8a7c12aed29
Sheep_B2.fasta
v1
Staphylococcus aureus
1280
Staphylococcus aureus
1280
Staphylococcus
1279
GCF_001197935.1
400/400
0
0
5b30dea6d9d9a8879a2aed2b
Sheep_B3.fasta
v1
Staphylococcus aureus
1280
Staphylococcus aureus
1280
Staphylococcus
1279
GCF_001208645.1
399/400
0
7.82718E-05
5b30dea8d9d9a8380d2aed2d
Patient_A1.fasta
v1
Staphylococcus aureus
1280
Staphylococcus aureus
1280
Staphylococcus
1279
GCF_001203975.1
400/400
0
0
5b30deacd9d9a8540c2aed2f
Cow_A.fasta
v1
Staphylococcus aureus
1280
Staphylococcus aureus
1280
Staphylococcus
1279
GCF_001193975.1
399/400
0
7.82718E-05
5b30deaed9d9a846cf2aed35
Patient_A2.fasta
v1
Staphylococcus aureus
1280
Staphylococcus aureus
1280
Staphylococcus
1279
GCF_000982735.1
400/400
0
0
5b30deaed9d9a8734e2aed31
Patient_B.fasta
v1
Staphylococcus aureus
1280
Staphylococcus aureus
1280
Staphylococcus
1279
GCF_001197935.1
400/400
0
0
5b30deaed9d9a8bca52aed33
Sheep_B1.fasta
v1
Staphylococcus aureus
1280
Staphylococcus aureus
1280
Staphylococcus
1279
GCF_001209085.1
400/400
0
0
The assembly quality statistics calculated for each genome are can be downloaded from Collection View downloads and the Selected Genomes popup. An example of the fields and output is given below.
Genome Name
Version
Genome Length
No. Contigs
Smallest Contig
Largest Contig
Average Contig Length
N50
non-ATCG
GC Content
73_ES_2858
v2
2774944
53
605
239743
52357
117181
152
32.7
79_PT_30
v2
2846170
46
398
336545
61873
110036
495
32.7
77_SE_2879
v2
2787435
50
326
481787
55748
96906
150
32.7
The presence/absence table of Pathogenwatch scheme core matches is available from the Collection View downloads. Family names are given in the column header with one row per assembly. The percentage overlap with the reference is given as a value between 0-1 ("0" is absent, "1" is complete and present. Multiple hits to the same family are separated with semi-colons.
The final score matrix used to build the neighbour-joining collection trees are available from the Collection View downloads. The score is the number of differences normalised by the percentage of the core matched, and so is typically almost identical to the number of differences.
The number of counted differences between each pair of assemblies in the current selection can be downloaded from the Collection View. For more details on how this is calculated see the Core Genome Tree section.
A summary of variance seen in the selected genomes, also sub-grouped by nearest reference, can be downloaded. An example of the output is given below.