Release Notes 2022
Pathogenwatch release notes for 2022
Last update of the 2022. The next one is already in preparation for the New Year.
- Genotyphi has been updated to include the new 188.8.131.52, 184.108.40.206, 220.127.116.11, 18.104.22.168.1, and 22.214.171.124.1.1 lineages. All Typhi genomes have also been updated.
- Pangolin typing for SARS-CoV-2 has been updated to pangolin-data v1.17 and all genomes updated.
- We have released a new version of Speciator that fixes an issue with the previous version sometimes erroneously mislabelling a small percentage of E. coli as Shigella sonnei. The new version has been validated against more than 150,000 E. coli, and tens of thousands of Shigella (including S. flexneri) via clustering of MLST profiles. All other assignments should remain the same, please contact us if you have any mislabelled genomes. Thanks to Julio Diaz Caballero for identifying and characterising the issue, along with subsequent validation.
- The clustering service was tweaked to support Campylobacter.
- The Klebsiella pneumoniae complex cgMLST scheme is now also run against K. quasipneumoniae, allowing clustering of genomes.
- Another small performance improvement to the Genome View page.
- LIN codes, using the Pasteur nomenclature scheme, are now assigned for Klebsiella pneumoniae complex species. These are, in essence, a hierarchical lineage code based on cgMLST allele distance. For genomes in a novel lineage, the nearest neighbour is identified and the code inferred to the appropriate level of similarity. For complete details, see the documentation.
- LIN codes have been applied to all currently K. pneumoniae genomes, and are available in the Genome Reports, Collection View and CSV downloads.
- The performance of the Genome List page has been further enhanced, and should generally load much quicker.
- The list of available linked collections in the Genome List is no longer available as a top level summary. Unfortunately, the aggregation was too slow on larger accounts.
- The performance of the genome list page has been improved to allow further growth.
- Pangolin has been updated to pangolin v4.1.3 and pangolin-data v1.15.1. All SARS-CoV-2 genomes have been updated.
- The MLST, cgMLST and NGSTAR schemes have all been updated to a snapshot built on the 11th of September, and all genomes updated.
- When there is more than one match to an MLST/cgMLST locus, if one (or more) has been assigned a code only it (or they) are selected for the final profile. This results in cleaner assignments that are more consistent with PubMLST.
- The MLST/cgMLST software no longer fails when FASTA headers include tabs or other unusual characters. Genomes without previous assignments have been updated.
- The cgMLST-based clustering should now be faster, and less likely to fail.
- The Pangolin SARS-CoV-2 lineage assignment tool has been updated to the latest version of the data library (v1.14). As usual, all genomes have been updated to the latest version.
- The Pangolin SARS-CoV-2 lineage assignment tool has been updated to the latest version of the data library (v1.13). As usual, all genomes have been updated to the latest version.
- The Kleborate AMR Genotypes table CSV download in the collection view has been fixed and now correctly reports matched elements.
- The Campylobacter jejuni MLST scheme has been extended to Campylobacter coli. All C. coli genomes have been updated.
- The Enterobacter cloacae scheme has been extended to the other defined species of the E. cloacae complex, and the relevant genomes updated.
- The SARS-CoV-2 lineage toolPangolin has been updated to pangolin-data v1.12. All SARS-CoV-2 genomes have been updated.
- Pangolin lineage assignments have been updated to Pangolin v4.1.1 and Pangolin-data v1.11. All SARS-CoV-2 genomes have been updated.
- Collection trees for Klebsiella quasipneumoniae and K. variicola. The core has been developed in collaboration with KlebNet.
- 56 K. quasipneumoniae genomes used as references.
- 41 K. variicola genomes used as references for tree building.
- All K. quasipneumoniae/variicola collections.
- Several internal releases have been rolled into a single external release
- K/O locus types from Kleborate+Kaptive were added to the Collection View Typing table for Klebsiella species.
- A minor tweak has been made to the rendering of some Kleborate fields in the Collection View Typing table.
- Serotype assignments from SISTR have been added to the Collection View Typing table for Salmonella species. These were previously available in the Genome Reports and main downloads.
- Internal system updates.
- Pangolin was updated to use the new pangolin-data v1.9. All SARS-CoV-2 genomes have been updated to the latest assignments.
- K and OC polysaccharide loci types for Acinetobacter baumanii are now assigned using the Kaptive tool and shown in the Genome Reports and Collection views. CSV downloads are also available via the Genome List view.
- A bug that could cause the collection view to crash if not all Kleborate results had been calculated for the selected Klebsiella/Raoultella genomes.
- The header of the "core allele distribution" download has been fixed. An extra comma was appended to the first column title. The first column has also been renamed for clarity.
- Google analytics has been replaced with the GDPR-compliant and privacy focused Plausible Analytics. The CGPS uses a self-hosted instance, so no user data is shared with 3rd parties in any form.
- Corrected the validation of DOI identifiers for genome and collection metadata.
- Some test data was leaking into the public website due to a bug in the "feature flag" checking.
- Clustering tasks should run more promptly than previously.
- It is now possible to provide links to literature references as either Pubmed IDs or DOI system identifiers for both genomes and collections.
- Pangolin has been updated to v4.0.6 and pangolin-data v1.8. All genomes have been updated.
- Reduced maximum concurrent read assembly tasks to prevent them blocking access to compute for other tasks. This will reduce the throughput of assembly tasks overall, unfortunately they were having a disproportionate effect on resource sharing.
- Internal configuration issue preventing scaling of services.
- The front page erroneously had a check mark in the "AMR Prediction" column for SARS-CoV-2. It has been removed.
While the update is completed there will be some disruption to the service. This should end by 12pm.
- GFF downloads via the data tables in the collection views are working again.
- A bug was found that could lead to false positives appearing of the visualisation of AMR determinants from Kleborate in Klebsiella collections. The individual genome reports, AMR profiles in the collection view, and CSV downloads were unaffected. We've also not identified any examples of it happening with the current version of Kleborate.
- The representation of the K/O locus representation from Kaptive/Kleborate in Klebsiella genome reports has been reworded for clarity.
- The E. coli MLST schemes have been extended to Shigella and all Shigella genomes updated with the new assignments.
- A bug was found in the assembly metrics analytics tool which caused contigs containing unusual characters in the first line of the record to be ignored. This mostly seems to affected SARS-CoV-2 genomes. A fixed version (v3) has been created and deployed. All genome records are being updated and should be fixed within a day of this release.
- The public Klebsiella (and Roualtella) metadata was updated to use the same primary accession as the ENA, and with a more consistent set of sample metadata.
- The short read assembly service has been fully integrated into Pathogenwatch and is officially out of "Beta".