cgMLST Clustering

About

Genomes with cgMLST schemes can be clustered via their genome report page. This helps to identify similar sequences which could be indicative of a transmission event or outbreak.

Methods

cgMLST profiles are calculated for assemblies when they're uploaded (if a suitable scheme is available). These are clustered by calculating distances between each assembly which shares a given cgMLST scheme. The distance is calculated as the number of different loci for the scheme, ignoring any which are missing (possibly due to sequencing or assembly errors).

These are then clustered using Single Linkage Clustering based on the calculated pairwise distances.

A network graph is shown on each Genome Page which highlights the selected genome and those which are connected to it at a given threshold. This threshold is the number of allele differences between the selected genome and other genomes. Other genomes are also shown which are part of the single linkage clustering at the selected threshold.

Last updated