About MLST

MLST schemes are based around a community-agreed set of 7 gene loci present in all strains of the species. A database of validated allele sequences is maintained for each locus and a code assigned to each one. An "ST" code is then generated from the unique combination alleles. The schemes supported by Pathogenwatch are provided by PubMLST, while an in-house search tool is used to rapidly but accurately assign the correct MLST assignment.

Novel allele and MLST codes are indicated by the "*" character at the start of the code. It may also contain letters instead of numbers. Novel loci codes will be consistent between releases, unless replaced by an assignment from the host scheme.

If your profile includes novel alleles or a novel MLST code, we recommend visiting the source database linked in the results page to submit your genome there. Generated assignments will subsequently be imported in Pathogenwatch at the next update.


The assembly is searched for exact matches to known alleles. A representative set of alleles for each locus are then searched for using Blast. These searches are combined and filtered based on the similarity of the match and length of the match. Novel alleles are hashed using the SHA-1 algorithm, this is then used as their unique identifier. Profiles are assigned based on the combination of alleles detected. Novel profiles are also given a unique identifier using the SHA-1 hash algorithm.


For each assembly the assigned allele codes and combined ST code is provided. If a locus is missing, the allele is represented with a question mark, while if it is a novel allele a four letter code that uniquely represents that allele is shown. In the Collection View a novel ST due to a combination of alleles is shown as a unique four letter code, while those due to a new allele also have an asterisk ("*") marking them.

Last updated