SARS-CoV-2 Genome Tree
How SARS-CoV-2 trees are constructed in Pathogenwatch.
About
Pathogenwatch will automatically generate a tree of SARS-CoV-2 genomes when a collection is created from the Genome Browser. When each genome is uploaded to Pathogenwatch an alignment against the Wuhan Hu 1 reference genome is stored. The selected genomes are aligned into a multiple sequence alignment and a dendrogram produced using FastTree. This tree is then displayed in the interactive collection viewer.
Method
Alignment
Each genome is mapped against the wuhan-hu-1 reference genome (NCBI Reference Sequence: NC_045512.2) using minimap2.
The resulting SAM file from each genome is converted into FASTA format using goFASTA.
The aligned FASTA output is stored in Pathogenwatch.
Tree Building
The FASTA files are concatenated into a multiple sequence alignment along with the wuhan-hu-1 reference.
Run FastTree with the options
-gtr -nosupport -nt
.Root the resulting tree to the reference.
Remove the reference from the tree.
References
FastTree: Price, M.N., Dehal, P.S., and Arkin, A.P. (2010) FastTree 2 -- Approximately Maximum-Likelihood Trees for Large Alignments. PLoS ONE, 5(3):e9490. doi:10.1371/journal.pone.0009490.
goFASTA: https://github.com/cov-ert/gofasta
minimap2: Li, H. (2018). Minimap2: pairwise alignment for nucleotide sequences. Bioinformatics, 34:3094-3100. doi:10.1093/bioinformatics/bty191
Last updated