cgMLST Clustering

About

Genomes with cgMLST schemes can be clustered via their genome report page. This helps to identify similar sequences which could be indicative of a transmission event or outbreak.

Methods

cgMLST profiles are calculated for assemblies when they're uploaded (if a suitable scheme is available). These are clustered by calculating distances between each assembly which shares a given cgMLST scheme. The distance is calculated as the number of different loci for the scheme, ignoring any which are missing (possibly due to sequencing or assembly errors).

These are then clustered using Single Linkage Clustering based on the calculated pairwise distances.

A network graph is shown on each Genome Page which highlights the selected genome and those which are connected to it at a given threshold. This threshold is the number of allele differences between the selected genome and other genomes. Other genomes are also shown which are part of the single linkage clustering at the selected threshold.

How to cite

The cgMLST clustering tool is first described in:

Sรกnchez-Busรณ L, Yeats CA, Taylor B, et al. A community-driven resource for genomic epidemiology and antimicrobial resistance prediction of Neisseria gonorrhoeae at Pathogenwatch. Genome Med. 2021;13(1):61. Published 2021 Apr 19. doi:10.1186/s13073-021-00858-2

The software is available with an OSS licence from https://github.com/pathogenwatch-oss/cgmlst-clustering

Last updated