Pathogenwatch
  • Welcome to Pathogenwatch
  • 🎉Announcements
  • ▶️A "Getting Started" Tutorial
  • 🎦Video Tutorials
  • 🧐Useful Links
  • 📖How to use Pathogenwatch
    • Uploading Genomes
    • Genome Reports
    • Browsing Genomes
    • Editing Metadata
    • 🚮Deleting genomes
    • Downloads
    • Creating A Collection
    • Browsing Collections
    • Sharing a collection
    • Genomic Context Search
    • Using The Interactive Collection Views
      • The Map View
      • The Tree Viewer
      • The Filter Bar
      • The Metadata Tables
        • Uploaded Metadata
        • Typing Results
        • Genome Statistics
        • Antimicrobial Resistance
    • Private Metadata
  • 📖Technical Descriptions
    • Species Assignment
      • Speciator
    • Sequence Typing Methods
      • cgMLST
      • Genotyphi
      • Kaptive
      • Kleborate
      • Klebsiella LIN Codes
      • MLST
      • NG-MAST
      • Pangolin
      • PopPUNK
      • SeroBA
      • Vista
      • SISTR
    • Antimicrobial Resistance Prediction
      • SPN-PBP-AMR
      • Kleborate
      • Pathogenwatch AMR
    • Inctyper
    • cgMLST Clustering
    • SARS-CoV-2 Notable Mutations
    • SARS-CoV-2 Genome Tree
    • Core Genome Tree
      • Core Assignment
      • Reference Assignment
      • Core Filter
      • Tree Construction
    • Short Read Assembly
  • ❓FAQ
  • 💾Public data downloads
  • 💊WHO bacterial priority pathogens
  • 📜Release Notes 2025
  • Release Notes 2024
  • Release Notes 2023
  • Release Notes 2022
  • Release Notes 2019-2021
  • ⚠️Privacy and Terms Of Service
  • 📣How to cite
  • 🙏Acknowledgements
  • ❗Report an Issue
Powered by GitBook
On this page
  • Usage
  • Expected results
  • Pipeline details
  • Constraints
  • Fair Usage Policy
  • How to cite
  1. Technical Descriptions

Short Read Assembly

How Pathogenwatch assembles short read Illumina data

PreviousTree ConstructionNextFAQ

Last updated 11 months ago

Primarily for the benefit of smaller labs or individual researchers without bioinformatics resources we also provide a small scale assembly service. Please be aware resources are limited, and large submissions are likely to take a long time to process.

We're using an assembly pipeline developed by our , based on . The pipeline is built using Nextflow, allowing it to run locally, on a cluster, or in the cloud. Find out more on the .

The assembly pipeline is run using the Pathogenwatch runner system and is included in the user fair-share calculations. Assembly tasks are heavily weighted in these calculations due to their large resource requirements, so if you are running them you may not see other submissions running for a while.

Usage

Start by visiting the upload page: and select "Short Read Assembly". If you don't have an account then you can create one quickly using your email address or social media account.

Drag your reads onto the page and they will start being uploaded.

You will start to see progress as your genomes are assembled and analysed.

It may take time before your genomes start being assembled. During this time we are automatically creating new servers

When your genomes have finished being analysed you can view the results as normal. You can also download your assemblies from the if you want to analyse them locally.

Expected results

Assembling and analysing one genome can take 15-45 minutes. It can also take more depending on the species and how busy the servers are.

Pipeline details

You can find more details of the pipeline in Github:

At time of writing, we run version 1.3.1 of the pipeline with the following commands:

nextflow run assembly.nf \
  --input_dir DIRECTORY_IN_S3 \
  --fastq_pattern '*_{1,2}.fastq.gz' \
  --output_dir DIRECTORY_IN_S3 \
  --adapter_file adapters.fas \
  --depth_cutoff 100 \
  --qc_conditions qc_conditions_nextera.yml \
  --prescreen_size_check 20000000

The pipeline dependencies can be found in the docker container bioinformant/ghru-assembly:1.3.1. We provide additional arguments to run the pipeline on our infrastructure which will not affect the end results.

In due course we will add additional QC checks and present errors to the user.

Constraints

We only support paired Illumina read data in .fastq.gz format. We welcome feedback on other sequencing platforms and suitable assembly pipelines.

As a Spades pipeline, we expect to receive bacteria or viral genomes. The only hard constraint is that the organism should have a predicted genome length of less than 20,000,000 bases.

Fair Usage Policy

We have set fair usage constraints for all our tasks in a single integrated queuing system. Assembly tasks require significant resources and so are weighted more heavily than other tasks, meaning that if you want to process more than a few (e.g. 10) genomes it will progress much more quickly if they are already assembled. Since our resources are openly shared with anyone who wants to use them, users uploading assembled genomes will tend to get priority and jump ahead in the queue.

How to cite

The SPAdes assembler: Prjibelski A, Antipov D, Meleshko D, Lapidus A, Korobeynikov A. Using SPAdes De Novo Assembler. Curr Protoc Bioinformatics. 2020;70(1):e102. doi:10.1002/cpbi.102

The pipeline is available under an OSS licence from

📖
Global Health Research Unit
Spades
pipeline project page
https://pathogen.watch/upload
genomes page
https://gitlab.com/cgps/ghru/pipelines/assembly
https://gitlab.com/cgps/ghru/pipelines/assembly