Links

SARS-CoV-2 Genome Tree

How SARS-CoV-2 trees are constructed in Pathogenwatch.

About

Pathogenwatch will automatically generate a tree of SARS-CoV-2 genomes when a collection is created from the Genome Browser. When each genome is uploaded to Pathogenwatch an alignment against the Wuhan Hu 1 reference genome is stored. The selected genomes are aligned into a multiple sequence alignment and a dendrogram produced using FastTree. This tree is then displayed in the interactive collection viewer.
SARS-CoV-2 tree built using the Pathogenwatch pipeline.

Method

Alignment

  • Each genome is mapped against the wuhan-hu-1 reference genome (NCBI Reference Sequence: NC_045512.2) using minimap2.
  • The resulting SAM file from each genome is converted into FASTA format using goFASTA.
  • The aligned FASTA output is stored in Pathogenwatch.

Tree Building

  • The FASTA files are concatenated into a multiple sequence alignment along with the wuhan-hu-1 reference.
  • Run FastTree with the options -gtr -nosupport -nt.
  • Root the resulting tree to the reference.
  • Remove the reference from the tree.

References