News about upcoming changes or down time.
The C. auris public genome collection has more than doubled in size to a total of 2,680 assembled genomes. New genomes with Illumina paired end reads and either location or time metadata were downloaded from the ENA on the 21st of June and assembled at the CGPS.
All C. auris collections have been updated to add the new public genomes to the subtrees.
Firstly, we identified around 200 genomes which were erroneously included in the update on the 26th April, and should have been removed during the QC step. These genomes have been removed from the public collection, but will still be visible in collections that have already been created.
Secondly, we have assembled and made available another 800 genomes that published by the ENA from April 2022 to March 2023, along with linked metadata including sample date and location.
As a result, there are now 12 Klebsiella africana, 1306 Klebsiella quasipneumoniae, 30 Klebsiella quasivariicola, and 1270 Klebsiella variicola assemblies available on Pathogenwatch.
We expect to deploy an update to the Candida auris core scheme on either Friday 4-5pm or Monday 4-5pm. This may cause some brief disruption while C. auris collections are rebuilt, and may mean some of these will not display correctly until this has been completed.
Thanks to our KlebNET collaboration, we have a new collection of 271 Salmonella Typhi assemblies with detailed curated annotation provided by the National Institute of Communicable Disease of South Africa (NICD).
We have a major update to the genome clustering process today that will necessitate closing the processing queues for up to 2 hours. The website should remain available throughout this time, though accessing clusters from genome reports may not be possible.
We've made it simpler to access complete downloads of the public data sets, including FASTAs, metadata and computed annotations. For more information, please visit the documentation pages "Public data downloads".
We identified 32 public genomes that didn't have their name fields set. These are now fixed and should correctly appear in searches and collections.
We have added 1,025 new Klebsiella genomes from the Kpn-complex to Pathogenwatch. The new additions include 578 K. variicola, 435 K. quasipneumoniae, 8 K. quasivariicola, and 4 K. africans.
Thanks to our collaborators in TyphiNET we have 749 new genomes with detailed annotation based on two recent surveillance studies. Follow the links below to view them.
We are pleased to announce official support C. auris as the first fungal pathogen in Pathogenwatch. In collaboration with Matthew Fisher and Johanna Rhodes, we have created an in-house core scheme, allowing the generation of phylogenetic trees, along with a resistance genotyping scheme. We have also included 1018 public genomes with sample date and locaiton metadata, as well as the five complete reference genomes. These genomes will be included in the population subtrees in C. auris collections.
The Vibriowatch consortium have added a further 4,272 manually selected and annotated Vibrio cholerae assembled genomes to the public collection. Thanks especially to Avril Coghlan from Nick Thompson's team.
We have corrected a large number of inferred coordinates in the new set of N. gonorrhoeae public genomes (released 23rd November). In all cases the location reported in the metadata was correct. However, due to the polymorphic nature of the location descriptions, the geocoder we use to infer the lat-long coordinates was severely confused. The locations have been manually reviewed and fixed, and we expect now full agreement between the stated country and inferred coordinates.
This update with 7,089 genomes means that Pathogenwatch now holds the largest collection of curated S. Typhi genome assemblies to date (11,998 assembled genomes in all). Assemblies and linked metadata were provided by the Global Typhoid Genomics Consortium. Metadata provided includes date, country of isolation, country of origin inferred from with travel information, isolation source (i.e. blood, sputum), and purpose of sampling (targeted or non-targeted), along with links to related records in the ENA. For more information please see https://www.medrxiv.org/content/10.1101/2022.12.28.22283969v.
We are expecting to remove subtrees and update the public Salmonella Typhi public genome database tomorrow.
In order to allow the expansion of the public Typhi genome database we have to remove the "subtrees" generated for Typhi collections. Instead, we recommend using the clustering tool to identify genomic neighbours in the public and your uploaded genomes. We expect to remove this functionality early next week.
We are having an issue with getting the Campylobacter clustering to run properly since the recent cgMLST update. We hope to have this resolved soon as no core changes have been made since the last time.
Note: This issue was resolved as of ~5pm UK time.
Unfortunately a release failed on Friday evening and we have to rerun it this morning. It will take a few hours as there are a lot of results to process, during which uploads, tree building and clustering will be suspended. We apologies for any inconvenience and any issues you may have experienced since Friday.
Announcements from 2022 below.
We have deployed the first 449 genome slice of a curated collection of Vibrio cholerae assemblies from the Vibriowatch project. As part of this update we have removed the old public genome set, most of which should be replaced as part of the future updates. We have also added two new public collections based on the papers these genomes are described in.
We have added 36,141 new genomes to the S. aureus public data set. This has necessitated removing subtrees from the S. aureus collections (see below). If you need to access one of these trees from an old collection, please contact us as all data is still present in the database. All genomes are available for search via the cgMLST profile clustering.
While preparing an update of over 35,000 new genomes we identified 146 duplicate samples in the current public data set. All of the duplicates were not included any public collections, so they have been simply removed from the public data. The records still exist, so any private collections that contain them will be still load correctly.
In order to enable the expansion of the public genome database, unfortunately we will have to disable the generation of reference subtrees in S. aureus collections. Instead, in order to search the genomic neighbourhood of a query genome and create a collection, you can use the cluster search method combined with the "List Genomes" button in the Genome Reports.
We are in the process of updating all user genomes to the correct species. We expect this to be complete by the 7th December. The update should have zero impact on any other assignments. Please note, as of the 2nd December update, all the public E. coli, S. sonnei and S. flexneri are correctly identified.
The 556 previous S. sonnei records have been removed and replaced with 13,211 new genomes with location and sample date have been sourced from ENA. 424 of the previous records are included in the new set, while the rest have been removed due to lack of quality control assessment or species assignment.
We are aware that are small percentage (<3%) of E. coli genomes are incorrectly assigned as S. sonnei genomes. We have a new version of speciator that fixes this in final testing and expect to deploy it next week. Previously incorrectly identified genomes will also be corrected and updated.
The previously unrepresented species of S. flexneri now had 9,435 genomes available in the public dataset. These can be searched using the E. coli cgMLST scheme clusters.
14,285 new S. pneumoniae samples have been added to the public data set. We identified 3,671 duplicate samples within the public data, which have been removed. We also removed 701 records that we were unable to verify met our quality control standards. We also replaced the representative assemblies for 1,332 of the previous public data set. In total, there are now 35,604 S. pneumoniae samples represented.
There are now 1,038 H. pylori genomes in the public data set - a species that was previously completely unrepresented.
Pathogenwatch now contains 5,699 Enterobacter genomes. 5,667 new genomes were added, replacing 306 of the previous genomes with improved assemblies, while 6 were removed for failing to pass QC.
The latest data update contains 23,063 new N. gonorrhoeae genomes, bring the total to 38,367.
Today we have added 21,379 C. coli genomes, and 51,187 C. jejuni genomes. The former's previous 47 public genomes, and the latter's 330 have been removed from the public database. All bar 12 C. jejuni and 2 C. coli genomes are represented in the new public data set. These 14 failed our updated QC thresholds.
The latest update contains 3715 new H. influenzae genomes. The 32 previous genomes were removed.
The latest update contains 15,553 new E. faecium genomes sourced from the ENA. Previously no genomes were available to search via cgMLST.
As part of the ongoing update to the Pathongenwatch public database, we have assembled and added 13,400 new P. aeruginosa genomes from the ENA. Previously there was one genome, which has been removed.
We have assembled 9,180 A. baumanii genomes, sourced from the ENA, with sample location, and/or date attributes, and other metadata. These have been included in the public data set, and can be searched using the cgMLST clustering. The previous 20 have been removed since it was not possible to identify which run they were built from. They are still represented in the new data set.
We are planning to deploy an update today that may have some impact on the database performance, and specifically browsing the complete genome list, while it is ongoing. It should take less than an hour, and won't affect processing genomes or viewing collections.
Apologies for the disruption on the 29th-31st. This was due to two reasons:
- 1.The update process was much slower than anticipated.
- 2.The initial fix for cgMLST clustering speeds caused an issue with other complex queries to the database, primarily breaking the Genome List view.
Everything should now be working at least as well as before, and cgMLST clustering is a lot quicker and more robust. Please let us know if there's anything we've missed.
The update scheduled for today is a bit of ahead of schedule and is going to begin this morning (UK am). The update should be complete by the end of the day, and we expect the site to remain browse-able, except for a couple of minutes. Please watch the announcements page and release notes for more information.
We are currently preparing an update of all genomes to the latest MLST and cgMLST assignments. We expect the update to be ready for this Thursday afternoon (UK time). While we don't expect the website to be inaccessible for more than a minute or two, it is likely that updating the genomes will take a couple of hours and no user tasks will be run during this time - so genome uploads, tree building and clustering will be unavailable.
This update will also include a potential fix the performance issues currently being observed for cgMLST clustering. If there are unexpected issues due to this change, there may also be some more brief disruption as it is reverted.
Unfortunately we're still having some issues affecting genome clustering, and we haven't yet identified the root cause. As a result the service will be going down for an estimated 30 minutes at 5pm UK time today. We apologise in advance for any inconvenience.
For an unknown reason, clustering tasks are freezing and causing the task queues to freeze as well. This does not appear to be related to any changes we have made, and could be the result of a 3rd party service provider, but the investigation is ongoing. We apologise for any inconvenience during this time.
Due to staff absences for August holidays, there is likely to be reduced support for answering questions and fixing problems. The site itself is pretty robust and "self-healing" so we don't anticipate any downtime during this period. Please do still get in touch if you do find an issue or want to know something, it may just take a few days before there is a response.
The metadata for the public collection of S. Typhi has been cleaned up and extended with extra fields from the ENA.
In conjunction with the release of "Europe-wide expansion and eradication of multidrug-resistant Neisseria gonorrhoeae lineages: a genomic surveillance study" by Leonor Sánchez-Busó et al. (The Lancet 2022), the 2,375 genomes of the EuroGASP 2018 structural survey have been added to the Pathogenwatch public genomes. You can also view them as an individual collection here: https://pathogen.watch/collection/eurogasp2018. The previous EuroGASP 2013 study can be found here: https://pathogen.watch/collection/eurogasp2013. Congratulations to all involved on an important piece of work.
Yesterday we had the convergence of a lot of uploads combined with a configuration error that prevented the scaling up of our compute systems. Apologies for any delays and issues with running tasks. Everything should now be processed.
Due to holidays in the UK, there will be minimal engineering support for Pathogenwatch. While we don't expect any issues, and the system is pretty good at recovering itself, please be aware that in the event of downtime there may be some delay in bringing it back.
We are preparing a large data set for future inclusion in Pathogenwatch. Due to time constraints this is likely to have some impact on the speed of processing external tasks over the next 24 hours. We apologise for any disruption and hope to keep it to a minimum.
Apologies for the extended period that Pathogenwatch was unavailable. There were a couple of mistakes made with regards the robustness of the upgrade process, which we will be correcting for future infrastructure updates. However, the major issue was beyond our control - the developer doing the upgrade (me) lost power and internet (including mobile) for 5 hours, along with a significant chunk of east London, at a critical moment. We hope no one's work was too disrupted
The good news is the database server has been significantly upgraded and the genome list view should now load considerably more reliably.