# The Real Time Assembly Pipeline

## About

The Centre for Genomic Pathogen Surveillance runs a genome assembly pipeline providing daily updates for the [WHO priority pathogens](/pathogenwatch/old-who-bacterial-priority-pathogens.md), along with a few other key species. The aim is provide real time support for local outbreak detection and characterisation within the global context, encouraging and maximising the value of rapid release of genomics data in the public archives. The genomes produced by this pipeline is automatically imported in Pathogenwatch and made freely available for community use.

## Viewing the data

Species supported by the "Always on" pipeline are marked in the [Genomes Browser](/pathogenwatch/how-to-use-pathogenwatch/browsing-and-viewing-genomes.md) drop down menus with a check mark on the right hand side. Within a specific species browser, the public genomes can be viewed by selecting the "Always on" folder within the Folder filter on the left hand side.

<figure><img src="/files/jbdcGZUBL1XYkzo5IxGl" alt="" width="563"><figcaption><p>Species that are updated daily are indicated with checkmarks in the drop down menus</p></figcaption></figure>

<figure><img src="/files/qveTr3Bar4FTJMavg7S8" alt=""><figcaption><p>Within a species the Folder filter can be used to just select the public data</p></figcaption></figure>

## The assembly pipeline

### Implementation

The CGPS pipeline is uses [the SPAdes assembler](https://github.com/ablab/spades) \[1] along with a set of assembly QC and check tools. The pipeline was written by Anthony Underwood and is available from [our code repository](https://gitlab.com/cgps/ghru/pipelines/dsl2/pipelines/assembly). For full details, please check the [README](https://gitlab.com/cgps/ghru/pipelines/dsl2/pipelines/assembly/-/blob/master/README.md?ref_type=heads).

{% hint style="info" %}
AMR.watch uses the the assemblies and results produced by Pathogenwatch. To see a summaries of the number of genomes imported, and the impact of the different filters, visit <https://amr.watch/summary> and <https://amr.watch/summary/all>.
{% endhint %}

### Selection Constraints

The pipeline is currently restricted to short read assembly, and is focused on genomic epidemiology rather than complete coverage of species diversity. A genome will be selected for assembly provided it:

1. Has been assigned to one of priority species
2. Meets assembly requirements:
   1. Illumina paired-end whole genome sequence
   2. Two FASTQs present
   3. &#x20;\>20x mean coverage according to the base count
   4. Available for download from the SRA
3. Meets minimum metadata requirements
   1. Has at least the year as sample date
   2. The location can be resolved to a country
   3. Associated with a single sample accession
4. Is the only representative for that sample
   1. The sample doesn't have a representative already
   2. If there is more than one possibility, it is the FASTQ pair with the most coverage

{% hint style="warning" %}
Genomes with updated metadata are not automatically detected and imported, but will be added on a more manual basis. If there are genomes you know have been updated with time or location data and now meets requirements, please [let us know](/pathogenwatch/report-an-issue.md) and we will add them.
{% endhint %}

## Citations

\[1] - [Prjibelski, A., Antipov, D., Meleshko, D., Lapidus, A., & Korobeynikov, A. (2020). Using SPAdes de novo assembler. Current Protocols in Bioinformatics, 70, e102. doi: 10.1002/cpbi.102](https://currentprotocols.onlinelibrary.wiley.com/doi/10.1002/cpbi.102)


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://cgps.gitbook.io/pathogenwatch/the-real-time-assembly-pipeline.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
