Pathogenwatch
  • Welcome to Pathogenwatch
  • 🎉Announcements
  • ▶️A "Getting Started" Tutorial
  • 🎦Video Tutorials
  • 🧐Useful Links
  • 📖How to use Pathogenwatch
    • Uploading Genomes
    • Genome Reports
    • Browsing Genomes
    • Editing Metadata
    • 🚮Deleting genomes
    • Downloads
    • Creating A Collection
    • Browsing Collections
    • Sharing a collection
    • Genomic Context Search
    • Using The Interactive Collection Views
      • The Map View
      • The Tree Viewer
      • The Filter Bar
      • The Metadata Tables
        • Uploaded Metadata
        • Typing Results
        • Genome Statistics
        • Antimicrobial Resistance
    • Private Metadata
  • 📖Technical Descriptions
    • Species Assignment
      • Speciator
    • Sequence Typing Methods
      • cgMLST
      • Genotyphi
      • Kaptive
      • Kleborate
      • Klebsiella LIN Codes
      • MLST
      • NG-MAST
      • Pangolin
      • PopPUNK
      • SeroBA
      • Vista
      • SISTR
    • Antimicrobial Resistance Prediction
      • SPN-PBP-AMR
      • Kleborate
      • Pathogenwatch AMR
    • Inctyper
    • cgMLST Clustering
    • SARS-CoV-2 Notable Mutations
    • SARS-CoV-2 Genome Tree
    • Core Genome Tree
      • Core Assignment
      • Reference Assignment
      • Core Filter
      • Tree Construction
    • Short Read Assembly
  • ❓FAQ
  • 💾Public data downloads
  • 💊WHO bacterial priority pathogens
  • 📜Release Notes 2025
  • Release Notes 2024
  • Release Notes 2023
  • Release Notes 2022
  • Release Notes 2019-2021
  • ⚠️Privacy and Terms Of Service
  • 📣How to cite
  • 🙏Acknowledgements
  • ❗Report an Issue
Powered by GitBook
On this page
  • About
  • Method
  • Creating The Reference Variance Profile
  • Querying the Variance Profile
  1. Technical Descriptions
  2. Core Genome Tree

Reference Assignment

PreviousCore AssignmentNextCore Filter

Last updated 11 months ago

About

Each assembly is linked to the nearest reference assembly by comparing the substitutions in the core profiles to each of the reference core profiles. The reference assignment is then used to identify potentially unreliable loci in the query assembly according to the variation filter method described in the section.

For some species (e.g. Salmonella Typhi) assemblies with the same reference assignment will be clustered to provide a more fine-grained view, useful for large collections in the .

Method

Creating The Reference Variance Profile

  1. The core profile is generated for each reference assembly.

  2. All substitutions are selected - excluding those with non-ATCG characters - and are extracted and aggregated into a single list of variant locations per gene family.

Querying the Variance Profile

  1. Each assembly is compared against each reference at all the sites in the species profile, excluding sites outside the boundaries of any fragment matches.

  2. The total number of sites in common are divided by the total number of compared sites in order to generate a similarity score.

  3. The query assembly is then assigned to the subgroup identified by the name of the most similar reference. If two references have the same score then then alphabetical order is used.

📖
Core Filter
Collection View