Pathogenwatch
  • Welcome to Pathogenwatch
  • 🎉Announcements
  • ▶️A "Getting Started" Tutorial
  • 🎦Video Tutorials
  • 🧐Useful Links
  • 📖How to use Pathogenwatch
    • Uploading Genomes
    • Genome Reports
    • Browsing Genomes
    • Editing Metadata
    • 🚮Deleting genomes
    • Downloads
    • Creating A Collection
    • Browsing Collections
    • Sharing a collection
    • Genomic Context Search
    • Using The Interactive Collection Views
      • The Map View
      • The Tree Viewer
      • The Filter Bar
      • The Metadata Tables
        • Uploaded Metadata
        • Typing Results
        • Genome Statistics
        • Antimicrobial Resistance
    • Private Metadata
  • 📖Technical Descriptions
    • Species Assignment
      • Speciator
    • Sequence Typing Methods
      • cgMLST
      • Genotyphi
      • Kaptive
      • Kleborate
      • Klebsiella LIN Codes
      • MLST
      • NG-MAST
      • Pangolin
      • PopPUNK
      • SeroBA
      • Vista
      • SISTR
    • Antimicrobial Resistance Prediction
      • SPN-PBP-AMR
      • Kleborate
      • Pathogenwatch AMR
    • Inctyper
    • cgMLST Clustering
    • SARS-CoV-2 Notable Mutations
    • SARS-CoV-2 Genome Tree
    • Core Genome Tree
      • Core Assignment
      • Reference Assignment
      • Core Filter
      • Tree Construction
    • Short Read Assembly
  • ❓FAQ
  • 💾Public data downloads
  • 💊WHO bacterial priority pathogens
  • 📜Release Notes 2025
  • Release Notes 2024
  • Release Notes 2023
  • Release Notes 2022
  • Release Notes 2019-2021
  • ⚠️Privacy and Terms Of Service
  • 📣How to cite
  • 🙏Acknowledgements
  • ❗Report an Issue
Powered by GitBook
On this page
  • About the downloads
  • File naming scheme
  • Annotation files
  • FASTA files
  • Using the downloads bucket
  • Via the browser
  • With cURL/jq/yq on the command line
  • s3cmd
  • Other

Public data downloads

Accessing complete species metadata, analysis and FASTA downloads

PreviousFAQNextWHO bacterial priority pathogens

Last updated 1 year ago

NB This feature is currently in testing and may be replaced.

About the downloads

In order to facilitate access to the Pathogenwatch public data sets, we have exported all the metadata and analysis CSVs, along with the assembled genome FASTAs, to a public "S3" bucket on DigitalOcean.

The root bucket URL is

File naming scheme

Species names contain characters that will need to be "URL encoded" for access. Examples of how to do this are given below.

Annotation files

Name format

<species name>__<tool name>.csv.gz

Example file link

FASTA files

Name format

<species name>__fasta.zip

Example file link

Using the downloads bucket

Via the browser

Getting the complete list of files

Click on the root bucket URL to view an XML text representation of all the available files.

Downloading an individual file

Use Ctrl-F/Cmd-F to search the page with the name of the species

Copy the root bucket URL into a new tab + / at the end of the URL and append the the contents of the Key field (i.e. <Key>[file name]</Key> and your browser should automatically download it (tested in Chrome)

With cURL/jq/yq on the command line

Getting the complete list of files.

curl https://pathogenwatch-public.ams3.cdn.digitaloceanspaces.com | xq '.ListBucketResult.Contents[].Key'

Downloading an individual file

jq is a tool for parsing JSON files on the command line. It can also be easily installed on most systems.

Substitute the name of the file you wish to download into the command below.

curl -O https://pathogenwatch-public.ams3.cdn.digitaloceanspaces.com/$( printf "Klebsiella pneumoniae__kleborate.csv.gz" | jq -sRr '@uri )'

s3cmd

Getting the complete list of files

s3cmd --host ams3.cdn.digitaloceanspaces.com --host-bucket "%(bucket)s.ams3.cdn.digitaloceanspaces.com" ls s3://pathogenwatch-public | sed -re 's,\s+, ,g' | cut -f 4- -d ' '

Downloading an individual file

s3cmd --host ams3.cdn.digitaloceanspaces.com --host-bucket "%(bucket)s.ams3.cdn.digitaloceanspaces.com" get "s3://pathogenwatch-public/Klebsiella pneumoniae__kleborate.csv.gz"

Downloading all the files

This will download all the files into the current directory

s3cmd --host ams3.cdn.digitaloceanspaces.com --host-bucket "%(bucket)s.ams3.cdn.digitaloceanspaces.com" get s3://pathogenwatch-public/ --recursive

Other

xq is tool for parsing XML from the . It can be easily installed for most systems.

The easiest tool for working with S3 buckets is the . It supports browsing, downloading and syncing from S3 buckets in general.

There are also libraries supporting the S3 API in most programming languages and computation platforms (i.e ).

💾
https://pathogenwatch-public.ams3.cdn.digitaloceanspaces.com
https://pathogenwatch-public.ams3.cdn.digitaloceanspaces.com/Klebsiella%20pneumoniae__kleborate.csv.gz
https://pathogenwatch-public.ams3.cdn.digitaloceanspaces.com/Klebsiella%20pneumoniae__fastas.zip
yq set of tools
s3cmd tool
Nextflow