# HierCC

HierCC = Hierarchical Clustering of CgMLST (used in Enterobase), resulting in codes representing population assignments

cgMLST = core genome Multi-Locus Sequence Typing

## Purpose

This tool robustly links Pathogenwatch cgMLST profiles to HierCC clusters from EnteroBase. The purpose is two-fold:

1. helps bring the HierCC labelling into Pathogenwatch
2. allows end users to potentially discover new candidate outbreak members via EnteroBase

It allows new genomes to be matched to the nearest EnteroBase cgMLST profile and HierCC (Zhou, Charlesworth, Achtmann; 2021) cluster identifiers inferred according to the profile distances. HierCC annotations provide both a familiar cluster naming scheme, and a complementary approach to linking genomes to the Pathogenwatch clustering.

## About *hclink*

*hclink* includes the complete set of cgMLST profiles and linked HierCC codes available on the day when that version was created. Given an input profile (a genome's cgMLST profile), *hclink* carries out a rapid search of all the available profiles to find the nearest profile according to the raw number of differences. Then the corrected HierCC distance score is calculated for this profile and used to infer the HierCC cluster codes up to the threshold indicated by the corrected distance.

It should be noted that this process is heuristic in nature and may select a non-optimal profile where there are larger numbers of missing alleles. Since these distances would generally be larger, they shouldn't affect outbreak cluster identification, which would generally be limited to low cluster thresholds.

## Validation of the HierCC linking tool for PATH-SAFE

The full validation report can be found [HERE](https://docs.google.com/document/d/1EdqtjrjyATLTd40IQedySXN3s7mm-eCx4MJ9JM9v0n8/edit#heading=h.qoa64rde0n93).

## Code Repository

<https://github.com/pathogenwatch-oss/hclink>


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://cgps.gitbook.io/pathogenwatch/technical-descriptions-of-analysis-tools/lineage-and-genotyping/finding-hiercc-codes-with-hclink.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
