FAQ

Frequently asked questions and other tips for using Pathogenwatch.

I can't find what I'm looking for in the documentation. Can you help me?

Of course. If you can't find what you're looking for, and you've tried the search box in the top right corner, please email us.

Our email address: pathogenwatch@cgps.group

Can I delete my uploads?

Not directly at the moment, but if you contact us at pathogenwatch@cgps.group then we can remove them from Pathogenwatch.

Is it free?

Yes! Pathogenwatch is a completely free service. We do limit how big a collection you can create, but otherwise there are no current restrictions in place on use. Thanks to our funders for enabling us to provide a public service.

There is a fair share mechanism that aims to give everyone reasonable access and timely data. If you upload many genomes or reads files, you may find yourself waiting for access.

How do I find close relatives of my genome?

There are two ways you can search your own genomes and the Pathogenwatch public library:

  1. If there is a cgMLST scheme for the organism, you can launch its Genome Report by clicking on its name in either the Browser or Collection and then click on the "View Clusters" button. This will return all public and personal genomes linked to the query genome according to cgMLST-based clustering at the specified threshold.

  2. If there is a population tree available for the species, then you can create a collection with one assembly (or more) and it/they will be placed into a subset of the population and a tree built. Population trees are restricted to a small number of species as the reference assignment method is not robust against lateral gene transfer.

I'm wondering about building trees with the public genomes ...

Go on ...

Have all the public genomes been assembled the same way?

No, there is some variation in how the public genomes have been assembled. This is due to a combination of historical and pragmatic reasons.

The first key reason is that not all genomes are sequenced using the same technology and so require different assembly methods or not even require assembly at all. We also seek to include genomes published and provided by the community and so can not control the methods used in these cases.

Secondly, Pathogenwatch is a long running resource and best practice for genome assembly is fast moving and varies from species-to-species. We have created a standard assembly pipeline which we use to import Illumina paired-end whole genome sequences from the ENA and made this available through the upload page, but this pipeline does change over time. Many of the current best tools and methods were not available when Pathogenwatch started, and this statement will remain true as sequencing and assembly methods change.

Given also that the costs of rerunning the assemblies and all downstream analyses is prohibitive, we have focused on ensuring that genomes meet our quality standard metrics before we include them in the public data rather than focusing on the method by which they produced.

Where can I see the pipeline and quality metrics?

The Pathogenwatch pipeline is available as open source code using the GPL v3 license from our GitLab repository with a description of the outputs in the README.md. You can see metrics on each uploaded assembly in the individual genome reports, the "Stats" download and in the "Stats" table within the collection viewer.

Doesn't this affect the results?

The majority of analyses run by Pathogenwatch will be largely unaffected by minor variations in a genome sequence since they rely on the detection of presence of particular variants or genes which are unlikely to appear as false positives. The differences between the assemblies from different pipelines tend to lie outside of the core genome and more in repeat regions, and so also tend not to have a big impact on how Pathogenwatch calculates trees or clusters. Certainly trees can be significantly affected if an assembly pipeline has introduced systemic false positive variants into the genome sequence.

We advise users to verify the quality of their assemblies if unusual results are found. In our experience, re-sequencing poor quality runs and using more sophisticated tree building methods to account for horizontal gene transfer have a greater impact on the topology and branch lengths of trees.

Do I need to make my own tree?

The SNP distance and NJ-based method of tree building for most Pathogenwatch species can be considered good enough for most purposes. We have extensively compared the resulting trees against independent publications and in-house datasets for multiple species and can show good consistency with other classification schemes like MLST. However, it is designed for speed and scalability and is sensitive to low quality data and the levels of recombination present. If you need a well supported precise tree for drawing detailed conclusions, perhaps on transmission events from closely related strains, we would suggest at least using an ML-based approach such as IQTree or FastTree. In this circumstance, the Pathogenwatch tree is best considered as a way of identifying the genomes to include in a more computationally intensive approach.

How does the website versioning system work?

Major releases

As of v13.0.0, changes to website functionality trigger a new major version. This can also include updates to the analyses run (minor updates), and website bug fixes or changes to the data presented or layout (patches). For full details of any release, see the release notes.

Minor releases

Minor releases correspond to one or more updates to the analyses run by Pathogenwatch. Pathogenwatch tasks are versioned using an internal system corresponding to unique builds of Docker images. When one is rebuilt all relevant genomes are also updated. These updates can also include "patches".

Patch releases

Patch releases correspond to bug fixes in the website, modifications to the site layout or the specifics of which data are presented and how. They can also represent internal updates or the addition of new features for testing by selected users.

Public data releases

The addition of new genomes to the public data set is currently not specifically versioned, but is announced in the release channel linked to the current version.

Last updated