SARS-CoV-2 Genome Tree
How SARS-CoV-2 trees are constructed in Pathogenwatch.
Last updated
How SARS-CoV-2 trees are constructed in Pathogenwatch.
Last updated
Pathogenwatch will automatically generate a tree of SARS-CoV-2 genomes when a collection is created from the Genome Browser. When each genome is uploaded to Pathogenwatch an alignment against the Wuhan Hu 1 reference genome is stored. The selected genomes are aligned into a multiple sequence alignment and a dendrogram produced using FastTree. This tree is then displayed in the interactive collection viewer.
Each genome is mapped against the wuhan-hu-1 reference genome (NCBI Reference Sequence: NC_045512.2) using minimap2.
The resulting SAM file from each genome is converted into FASTA format using goFASTA.
The aligned FASTA output is stored in Pathogenwatch.
The FASTA files are concatenated into a multiple sequence alignment along with the wuhan-hu-1 reference.
Run FastTree with the options -gtr -nosupport -nt
.
Root the resulting tree to the reference.
Remove the reference from the tree.
FastTree: Price, M.N., Dehal, P.S., and Arkin, A.P. (2010) FastTree 2 -- Approximately Maximum-Likelihood Trees for Large Alignments. PLoS ONE, 5(3):e9490. doi:10.1371/journal.pone.0009490.
goFASTA: https://github.com/cov-ert/gofasta
minimap2: Li, H. (2018). Minimap2: pairwise alignment for nucleotide sequences. Bioinformatics, 34:3094-3100. doi:10.1093/bioinformatics/bty191