Downloads

Accessing the Downloads

All metadata and results from calculations done by Pathogenwatch are available to download in various formats. Most downloads can either be accessed by selecting genomes in the Genome Browser view and clicking on the "Selected Genomes" button in the top right corner and then "Downloads" or via the "Downloads" button in the Collection View. Additionally, assembly FASTA files and GFF files can be obtained by clicking the icons in the Metadata Tables in the leftmost column.

Downloads can take some time to generate for large collections.

Available Downloads

The downloaded FASTA files are compressed using zip. Generally you'll need to unzip the bundle before using the files in other software.

FASTA Files

The original uploaded FASTA files for each assembly. The FASTA downloads can be accessed from the Selected Genomes popup and from the Metadata tables in the Collection.

GFF Files

A GFF format output of the core, MLST, and AMR results is available for each assembly from the Metadata Tables in the Collection View.

Core Matches

{seqid}	{source}	{type}     {start}	{end}	{s}   {strand}	{phase}	{attributes}
##gff-version 3.2.1
ContigA	Pathogenwatch_Core	CDS_region	290     1000	90.2	+	0	ID=ContigA_290-1000;Name=fskA;Target=FskA 1 710;TargetLength=710;Note=Complete Match;Evalue=1e-100
ContigB	Pathogenwatch_Core	CDS_region	32900	35301	86.03	-	1	ID=ContigB_32900-35301;Name=grpA;Target=GrpA 2 2401;TargetLength=2499;Note=Partial Match;Evalue=1e-140
  • The score ({s}) is the percent identity to the reference allele.

  • The Name attribute is the Pathogenwatch gene family name.

  • Target is the reference allele identifier and the start-end of the match.

MLST Matches

{seqid}	{source}	{type}	       {start}	{end} {s} {strand}  {phase}	{attributes}
ContigA	Pathogenwatch_MLST	genetic_marker	500     799	  98.5  +  	 1	ID=MLST_ContigA_500-799;Name=arcc;Target=arcc_1001 1 300;Note=Allele 1001;Evalue=1e-120
  • The score ({s}) is the percent identity to the reference allele.

  • The Name attribute is the Pathogenwatch gene family name.

  • Target is the reference allele identifier and the start-end of the match.

AMR Matches

{seqid}	{source}	{type}	{start}	{end}	{s}	{strand}	{phase}	{attributes}
ContigA	Pathogenwatch_PAAR	CDS	300	599	87.5	+	1	ID=PAAR_ContigA_300-599;Name=mecA;Target=mecA 1 399;Note=Methicillin;Evalue=1e-100;TargetLength=399
ContigB	Pathogenwatch_SNPAR	CDS	200	499	99.5	+	1	ID=SNPAR_23S_RNA_1;Name=23S_RNA_1;Target=23S_RNA 1 399;Evalue=1e-120;TargetLength=399
ContigB	Pathogenwatch_SNPAR	point_mutation	251	251	.	+	.	ID=SNPAR_23S_RNA_1_T52A;Name=23S_RNA_T52A;Parent=SNPAR_23S_RNA_1;Note=Erythromycin,Ciprofloxacin
ContigB	Pathogenwatch_SNPAR	CDS	1000	3000	99.0	+	1	ID=SNPAR_cpA_1;Name=cipA_1;Target=cipA1 1 2001;Evalue=1e-140;TargetLength=2001
ContigB	Pathogenwatch_SNPAR	point_mutation	1784	1784	.	+	.	ID=SNPAR_cpA_1_A262P;Name=cpA_A262P;Parent=SNPAR_cpA_1;Note=Cipidydoxin
  • "_PAAR" are presence/absence matches.

  • "_SNPAR" are for gene variants.

Notes on "_SNPAR"

  • Consists of two components - (1) the match to the SNPAR genes (type=CDS) (2) the individual resistance mutations (type=point_mutation)

  • The CDS record describes the reference gene and match statistics. There can be 1 or more mutation records that cause an amino acid change leading to resistance. Each mutation should have a reference to a parent CDS feature.

Apart from the FASTA and GFF files, the rest of the downloads are plain text CSV for easy use with other software

AMR

AMR Profile

The overall AMR profile - i.e. the antibiotics for which potential resistance genotypes have been identified for each assembly - is available from the Selected Genomes popup and the Collection View Downloads. The header consists of genome name, id, analysis module version and each antibiotic for that species. Possible values are "NOT_FOUND", "INTERMEDIATE" and "RESISTANT"

Genome ID

Genome Name

Version

Amikacin

Penicillin

Tobramycin

5af2b16630c1ce57566f4809

73_ES_2858

v2

NOT_FOUND

RESISTANT

NOT_FOUND

5af2b16630c1ce66cd6f480b

79_PT_30

v2

NOT_FOUND

RESISTANT

NOT_FOUND

5af2b16930c1cead386f480d

77_SE_2879

v2

NOT_FOUND

RESISTANT

NOT_FOUND

Downloads in the Collection View will reflect the current selection.

AMR SNPs & Genes

In the Collection View Downloads the presence/absence ("1"/"0") profile of AMR-associated sequence variants and AMR-associated genes can be downloaded. From both download menus, a CSV file of the resistance profile can be downloaded for selected assemblies. Example output for some Staphylococcus aureus assemblies can be seen in the table below.

Genome Name

Version

Amikacin

Gentamicin

Tobramycin

Kanamycin

Methicillin

Penicillin

Fusidic Acid

5b30dea5d9d9a8a7c12aed29

Sheep_B2.fasta

v2

NOT_FOUND

NOT_FOUND

NOT_FOUND

NOT_FOUND

RESISTANT

RESISTANT

NOT_FOUND

5b30dea6d9d9a8879a2aed2b

Sheep_B3.fasta

v2

NOT_FOUND

NOT_FOUND

NOT_FOUND

NOT_FOUND

RESISTANT

RESISTANT

NOT_FOUND

5b30dea8d9d9a8380d2aed2d

Patient_A1.fasta

v2

NOT_FOUND

NOT_FOUND

NOT_FOUND

NOT_FOUND

RESISTANT

RESISTANT

NOT_FOUND

5b30deacd9d9a8540c2aed2f

Cow_A.fasta

v2

NOT_FOUND

NOT_FOUND

NOT_FOUND

NOT_FOUND

RESISTANT

RESISTANT

NOT_FOUND

cgMLST

The cgMLST assignments for each assembly are available in CSV format from both download menus. Example output is shown below.

Genome Name

Version

Gene

Allele ID

Start

End

Contig

Direction

5b30dea8d9d9a8380d2aed2d

Patient_A1.fasta

20180516172348-v1.6.1

SAUR0001

2

145292

143931

ERS049983.7092_7_81.4

reverse

5b30dea8d9d9a8380d2aed2d

Patient_A1.fasta

20180516172348-v1.6.1

SAUR0002

33

143651

142518

ERS049983.7092_7_81.4

reverse

5b30dea8d9d9a8380d2aed2d

Patient_A1.fasta

20180516172348-v1.6.1

SAUR0003

2

142128

141892

ERS049983.7092_7_81.4

reverse

5b30dea8d9d9a8380d2aed2d

Patient_A1.fasta

20180516172348-v1.6.1

SAUR0004

2

141895

140783

ERS049983.7092_7_81.4

reverse

MLST

MLST assignments can be downloaded from both of the download menus. The source of the schema (e.g. PubMLST) is shown and linked to in the download menu. Example output for Staphylococcus aureus can be seen below:

Genome Name

Version

ST

arcC

aroE

glpF

gmk

pta

tpi

yqiL

5b30dea5d9d9a8a7c12aed29

Sheep_B2.fasta

20180516172348-v1.6.1

130

6

57

45

2

7

58

52

5b30dea6d9d9a8879a2aed2b

Sheep_B3.fasta

20180516172348-v1.6.1

130

6

57

45

2

7

58

52

5b30dea8d9d9a8380d2aed2d

Patient_A1.fasta

20180516172348-v1.6.1

130

6

57

45

2

7

58

52

Genotyphi

Genotyphi is only run for Salmonella Typhi. Result CSVs are available from both the download menus. Example results are shown below.

Genome ID

Genome Name

Version

Genotype

SNPs Called

5a69cfc256aeb700010dcffc

007898.fasta

v2

4.3.1

68

5a69cfc456aeb700010dcffe

404Ty.fasta

v2

3.1.2

68

5a69cfc456aeb700010dd000

11909_3.fasta

v2

2.0.2

69

5a69cffab0c5b70001796add

Ty2.fasta

v2

4.1

68

Speciation

The detailed Speciator output can be downloaded from both the collection view and the Selected Genomes box. Example output a set of Staphylococcus aureus can be seen below.

Genome ID

Genome Name

Version

Organism Name

Organism ID

Species Name

Species ID

Genus Name

Genus ID

Reference ID

Matching Hashes

p-Value

Mash Distance

5b30dea5d9d9a8a7c12aed29

Sheep_B2.fasta

v1

Staphylococcus aureus

1280

Staphylococcus aureus

1280

Staphylococcus

1279

GCF_001197935.1

400/400

0

0

5b30dea6d9d9a8879a2aed2b

Sheep_B3.fasta

v1

Staphylococcus aureus

1280

Staphylococcus aureus

1280

Staphylococcus

1279

GCF_001208645.1

399/400

0

7.82718E-05

5b30dea8d9d9a8380d2aed2d

Patient_A1.fasta

v1

Staphylococcus aureus

1280

Staphylococcus aureus

1280

Staphylococcus

1279

GCF_001203975.1

400/400

0

0

5b30deacd9d9a8540c2aed2f

Cow_A.fasta

v1

Staphylococcus aureus

1280

Staphylococcus aureus

1280

Staphylococcus

1279

GCF_001193975.1

399/400

0

7.82718E-05

5b30deaed9d9a846cf2aed35

Patient_A2.fasta

v1

Staphylococcus aureus

1280

Staphylococcus aureus

1280

Staphylococcus

1279

GCF_000982735.1

400/400

0

0

5b30deaed9d9a8734e2aed31

Patient_B.fasta

v1

Staphylococcus aureus

1280

Staphylococcus aureus

1280

Staphylococcus

1279

GCF_001197935.1

400/400

0

0

5b30deaed9d9a8bca52aed33

Sheep_B1.fasta

v1

Staphylococcus aureus

1280

Staphylococcus aureus

1280

Staphylococcus

1279

GCF_001209085.1

400/400

0

0

Stats

The assembly quality statistics calculated for each genome are can be downloaded from Collection View downloads and the Selected Genomes popup. An example of the fields and output is given below.

Genome Name

Version

Genome Length

No. Contigs

Smallest Contig

Largest Contig

Average Contig Length

N50

non-ATCG

GC Content

73_ES_2858

v2

2774944

53

605

239743

52357

117181

152

32.7

79_PT_30

v2

2846170

46

398

336545

61873

110036

495

32.7

77_SE_2879

v2

2787435

50

326

481787

55748

96906

150

32.7

Core Allele Distribution

The presence/absence table of Pathogenwatch scheme core matches is available from the Collection View downloads. Family names are given in the column header with one row per assembly. The percentage overlap with the reference is given as a value between 0-1 ("0" is absent, "1" is complete and present. Multiple hits to the same family are separated with semi-colons.

Score Matrix

The final score matrix used to build the neighbour-joining collection trees are available from the Collection View downloads. The score is the number of differences normalised by the percentage of the core matched, and so is typically almost identical to the number of differences.

Difference Matrix

The number of counted differences between each pair of assemblies in the current selection can be downloaded from the Collection View. For more details on how this is calculated see the Core Genome Tree section.

Variance Summary

A summary of variance seen in the selected genomes, also sub-grouped by nearest reference, can be downloaded. An example of the output is given below.

Last updated