Pathogenwatch
  • Welcome to Pathogenwatch
  • 🎉Announcements
  • ▶️A "Getting Started" Tutorial
  • 🎦Video Tutorials
  • 🧐Useful Links
  • 📖How to use Pathogenwatch
    • Uploading Genomes
    • Genome Reports
    • Browsing Genomes
    • Editing Metadata
    • 🚮Deleting genomes
    • Downloads
    • Creating A Collection
    • Browsing Collections
    • Sharing a collection
    • Genomic Context Search
    • Using The Interactive Collection Views
      • The Map View
      • The Tree Viewer
      • The Filter Bar
      • The Metadata Tables
        • Uploaded Metadata
        • Typing Results
        • Genome Statistics
        • Antimicrobial Resistance
    • Private Metadata
  • 📖Technical Descriptions
    • Species Assignment
      • Speciator
    • Sequence Typing Methods
      • cgMLST
      • Genotyphi
      • Kaptive
      • Kleborate
      • Klebsiella LIN Codes
      • MLST
      • NG-MAST
      • Pangolin
      • PopPUNK
      • SeroBA
      • Vista
      • SISTR
    • Antimicrobial Resistance Prediction
      • SPN-PBP-AMR
      • Kleborate
      • Pathogenwatch AMR
    • Inctyper
    • cgMLST Clustering
    • SARS-CoV-2 Notable Mutations
    • SARS-CoV-2 Genome Tree
    • Core Genome Tree
      • Core Assignment
      • Reference Assignment
      • Core Filter
      • Tree Construction
    • Short Read Assembly
  • ❓FAQ
  • 💾Public data downloads
  • 💊WHO bacterial priority pathogens
  • 📜Release Notes 2025
  • Release Notes 2024
  • Release Notes 2023
  • Release Notes 2022
  • Release Notes 2019-2021
  • ⚠️Privacy and Terms Of Service
  • 📣How to cite
  • 🙏Acknowledgements
  • ❗Report an Issue
Powered by GitBook
On this page
  • About
  • Methods
  • How to cite
  1. Technical Descriptions

cgMLST Clustering

PreviousInctyperNextSARS-CoV-2 Notable Mutations

Last updated 11 months ago

About

Genomes with cgMLST schemes can be clustered via their genome report page. This helps to identify similar sequences which could be indicative of a transmission event or outbreak.

Methods

cgMLST profiles are calculated for assemblies when they're uploaded (if a suitable scheme is available). These are clustered by calculating distances between each assembly which shares a given cgMLST scheme. The distance is calculated as the number of different loci for the scheme, ignoring any which are missing (possibly due to sequencing or assembly errors).

These are then clustered using Single Linkage Clustering based on the calculated pairwise distances.

A network graph is shown on each which highlights the selected genome and those which are connected to it at a given threshold. This threshold is the number of allele differences between the selected genome and other genomes. Other genomes are also shown which are part of the single linkage clustering at the selected threshold.

How to cite

The cgMLST clustering tool is first described in:

Sánchez-Busó L, Yeats CA, Taylor B, et al. A community-driven resource for genomic epidemiology and antimicrobial resistance prediction of Neisseria gonorrhoeae at Pathogenwatch. Genome Med. 2021;13(1):61. Published 2021 Apr 19. doi:10.1186/s13073-021-00858-2

The software is available with an OSS licence from

📖
https://github.com/pathogenwatch-oss/cgmlst-clustering
Genome Page