Announcements
News about upcoming changes or down time.
Last updated
News about upcoming changes or down time.
Last updated
We have added 13 new cgMLST schemes, covering 2 genera and 11 species, enabling clustering and context searching for these species. We have also added 1 new MLST scheme. For full details, please see the .
Apologies for any inconvience in advance, but on Friday the 25th at 9am we will be switching off the website for several hours will we manage a database upgrade. At the same time we will also be updating all MLST and cgMLST assignments to the latest version. The task queues will be closed around an hour earlier to allow jobs to complete prior to shutdown.
As part of the KlebNET consortium we are now hosting the curated collection of Klebsiella pneumoniae samples from neonatal sepsis cases. has been manually curated by the KlebNET consortium to support research into the microbial genomics of neonatal sepsis, a leading cause of infant mortality in low and middle-income countries.
We are pleased to announce that a collection of the genomes from the EuroGASP 2020 N. gonorrhoeae surveillance study is now publicaly available via this . The genomes have also been included in the and added to cgMLST clustering searches.
We identified 7258 Neisseria gonorhoeae samples with duplicate assemblies as a result of a missed filtering step during the last update. The extra copies have been removed from the public databases, but will still be accessible for collections that already include them. We have also merged the metadata to provide more complete records for those strains, and fixed the Pubmed and DOI links for all N. gonorrhoeae.
FASTQ upload issues RESOLVED
FASTQ upload issues
Downtime for server upgrade
At 9am on the 8th of July 2024 we will be taking down the website in order to upgrade the database server. This process may take several hours, please bear with us during this time. It has been some time since our last upgrade and our user base has been rapidly growing. It should significantly improve the site stability and its ability to handle larger groups of concurrent users.
Note that any genomes already queued for analysis will not be lost, and they will be processed once the website starts back up again.
This weekend we will be updating the cgMLST profiles and cgMLST-based clustering for all Pathogenwatch genomes. This update brings our cgMLST assignment in line with the community standard set by PubMLST and should lead to improvements in the clustering. The site is expected to remain available at all times, but the processing queue may be closed for most of the weekend. We apologies in advance for any inconvenience.
This weekend we will be updating the cgMLST profiles and cgMLST-based clustering for all Pathogenwatch genomes. This update brings our cgMLST assignment in line with the community standard set by PubMLST and should lead to improvements in the clustering. The site is expected to remain available at all times, but the processing queue may be closed for most of the weekend. We apologies in advance for any inconvenience.
All C. auris collections have been updated to add the new public genomes to the subtrees.
Secondly, we have assembled and made available another 800 genomes that published by the ENA from April 2022 to March 2023, along with linked metadata including sample date and location.
As a result, there are now 12 Klebsiella africana, 1306 Klebsiella quasipneumoniae, 30 Klebsiella quasivariicola, and 1270 Klebsiella variicola assemblies available on Pathogenwatch.
We expect to deploy an update to the Candida auris core scheme on either Friday 4-5pm or Monday 4-5pm. This may cause some brief disruption while C. auris collections are rebuilt, and may mean some of these will not display correctly until this has been completed.
Thanks to our KlebNET collaboration, we have a new collection of 271 Salmonella Typhi assemblies with detailed curated annotation provided by the National Institute of Communicable Disease of South Africa (NICD).
We have a major update to the genome clustering process today that will necessitate closing the processing queues for up to 2 hours. The website should remain available throughout this time, though accessing clusters from genome reports may not be possible.
We identified 32 public genomes that didn't have their name fields set. These are now fixed and should correctly appear in searches and collections.
We have added 1,025 new Klebsiella genomes from the Kpn-complex to Pathogenwatch. The new additions include 578 K. variicola, 435 K. quasipneumoniae, 8 K. quasivariicola, and 4 K. africans.
Thanks to our collaborators in TyphiNET we have 749 new genomes with detailed annotation based on two recent surveillance studies. Follow the links below to view them.
We are pleased to announce official support C. auris as the first fungal pathogen in Pathogenwatch. In collaboration with Matthew Fisher and Johanna Rhodes, we have created an in-house core scheme, allowing the generation of phylogenetic trees, along with a resistance genotyping scheme. We have also included 1018 public genomes with sample date and locaiton metadata, as well as the five complete reference genomes. These genomes will be included in the population subtrees in C. auris collections.
The Vibriowatch consortium have added a further 4,272 manually selected and annotated Vibrio cholerae assembled genomes to the public collection. Thanks especially to Avril Coghlan from Nick Thompson's team.
We have corrected a large number of inferred coordinates in the new set of N. gonorrhoeae public genomes (released 23rd November). In all cases the location reported in the metadata was correct. However, due to the polymorphic nature of the location descriptions, the geocoder we use to infer the lat-long coordinates was severely confused. The locations have been manually reviewed and fixed, and we expect now full agreement between the stated country and inferred coordinates.
We are expecting to remove subtrees and update the public Salmonella Typhi public genome database tomorrow.
In order to allow the expansion of the public Typhi genome database we have to remove the "subtrees" generated for Typhi collections. Instead, we recommend using the clustering tool to identify genomic neighbours in the public and your uploaded genomes. We expect to remove this functionality early next week.
Note: This issue was resolved as of ~5pm UK time.
Unfortunately a release failed on Friday evening and we have to rerun it this morning. It will take a few hours as there are a lot of results to process, during which uploads, tree building and clustering will be suspended. We apologies for any inconvenience and any issues you may have experienced since Friday.
We have deployed the first 449 genome slice of a curated collection of Vibrio cholerae assemblies from the Vibriowatch project. As part of this update we have removed the old public genome set, most of which should be replaced as part of the future updates. We have also added two new public collections based on the papers these genomes are described in.
While preparing an update of over 35,000 new genomes we identified 146 duplicate samples in the current public data set. All of the duplicates were not included any public collections, so they have been simply removed from the public data. The records still exist, so any private collections that contain them will be still load correctly.
We are in the process of updating all user genomes to the correct species. We expect this to be complete by the 7th December. The update should have zero impact on any other assignments. Please note, as of the 2nd December update, all the public E. coli, S. sonnei and S. flexneri are correctly identified.
The 556 previous S. sonnei records have been removed and replaced with 13,211 new genomes with location and sample date have been sourced from ENA. 424 of the previous records are included in the new set, while the rest have been removed due to lack of quality control assessment or species assignment.
We are aware that are small percentage (<3%) of E. coli genomes are incorrectly assigned as S. sonnei genomes. We have a new version of speciator that fixes this in final testing and expect to deploy it next week. Previously incorrectly identified genomes will also be corrected and updated.
The previously unrepresented species of S. flexneri now had 9,435 genomes available in the public dataset. These can be searched using the E. coli cgMLST scheme clusters.
14,285 new S. pneumoniae samples have been added to the public data set. We identified 3,671 duplicate samples within the public data, which have been removed. We also removed 701 records that we were unable to verify met our quality control standards. We also replaced the representative assemblies for 1,332 of the previous public data set. In total, there are now 35,604 S. pneumoniae samples represented.
There are now 1,038 H. pylori genomes in the public data set - a species that was previously completely unrepresented.
Pathogenwatch now contains 5,699 Enterobacter genomes. 5,667 new genomes were added, replacing 306 of the previous genomes with improved assemblies, while 6 were removed for failing to pass QC.
The latest data update contains 23,063 new N. gonorrhoeae genomes, bring the total to 38,367.
Today we have added 21,379 C. coli genomes, and 51,187 C. jejuni genomes. The former's previous 47 public genomes, and the latter's 330 have been removed from the public database. All bar 12 C. jejuni and 2 C. coli genomes are represented in the new public data set. These 14 failed our updated QC thresholds.
The latest update contains 3715 new H. influenzae genomes. The 32 previous genomes were removed.
The latest update contains 15,553 new E. faecium genomes sourced from the ENA. Previously no genomes were available to search via cgMLST.
As part of the ongoing update to the Pathongenwatch public database, we have assembled and added 13,400 new P. aeruginosa genomes from the ENA. Previously there was one genome, which has been removed.
We are planning to deploy an update today that may have some impact on the database performance, and specifically browsing the complete genome list, while it is ongoing. It should take less than an hour, and won't affect processing genomes or viewing collections.
Apologies for the disruption on the 29th-31st. This was due to two reasons:
The update process was much slower than anticipated.
The initial fix for cgMLST clustering speeds caused an issue with other complex queries to the database, primarily breaking the Genome List view.
Everything should now be working at least as well as before, and cgMLST clustering is a lot quicker and more robust. Please let us know if there's anything we've missed.
The update scheduled for today is a bit of ahead of schedule and is going to begin this morning (UK am). The update should be complete by the end of the day, and we expect the site to remain browse-able, except for a couple of minutes. Please watch the announcements page and release notes for more information.
We are currently preparing an update of all genomes to the latest MLST and cgMLST assignments. We expect the update to be ready for this Thursday afternoon (UK time). While we don't expect the website to be inaccessible for more than a minute or two, it is likely that updating the genomes will take a couple of hours and no user tasks will be run during this time - so genome uploads, tree building and clustering will be unavailable.
This update will also include a potential fix the performance issues currently being observed for cgMLST clustering. If there are unexpected issues due to this change, there may also be some more brief disruption as it is reverted.
Unfortunately we're still having some issues affecting genome clustering, and we haven't yet identified the root cause. As a result the service will be going down for an estimated 30 minutes at 5pm UK time today. We apologise in advance for any inconvenience.
For an unknown reason, clustering tasks are freezing and causing the task queues to freeze as well. This does not appear to be related to any changes we have made, and could be the result of a 3rd party service provider, but the investigation is ongoing. We apologise for any inconvenience during this time.
Due to staff absences for August holidays, there is likely to be reduced support for answering questions and fixing problems. The site itself is pretty robust and "self-healing" so we don't anticipate any downtime during this period. Please do still get in touch if you do find an issue or want to know something, it may just take a few days before there is a response.
The metadata for the public collection of S. Typhi has been cleaned up and extended with extra fields from the ENA.
Yesterday we had the convergence of a lot of uploads combined with a configuration error that prevented the scaling up of our compute systems. Apologies for any delays and issues with running tasks. Everything should now be processed.
Due to holidays in the UK, there will be minimal engineering support for Pathogenwatch. While we don't expect any issues, and the system is pretty good at recovering itself, please be aware that in the event of downtime there may be some delay in bringing it back.
We are preparing a large data set for future inclusion in Pathogenwatch. Due to time constraints this is likely to have some impact on the speed of processing external tasks over the next 24 hours. We apologise for any disruption and hope to keep it to a minimum.
Apologies for the extended period that Pathogenwatch was unavailable. There were a couple of mistakes made with regards the robustness of the upgrade process, which we will be correcting for future infrastructure updates. However, the major issue was beyond our control - the developer doing the upgrade (me) lost power and internet (including mobile) for 5 hours, along with a significant chunk of east London, at a critical moment. We hope no one's work was too disrupted
The good news is the database server has been significantly upgraded and the genome list view should now load considerably more reliably.
We believer we identified the root cause of the issue as a networking change we had made on the 16th July, We have reverted this change and are not seeing any more failures. If you are still seeing your uploads blocked .
We have had several reports that FASTQ uploads are often getting blocked and not proceeding. We are investigating the issue, and hope to have a resolution in place soon. We will update this announcement when the uploads are working fully again. If you wish to be notified directly .
We are please to announce as part of the Vibriowatch project, we have added , making a total of 5,671 assembled genomes. We have also added a further . Thanks to Avril Coghlan for manually curating and collating everything.
The has more than doubled in size to a total of 2,680 assembled genomes. New genomes with Illumina paired end reads and either location or time metadata were downloaded from the ENA on the 21st of June and assembled at the CGPS.
We have released four new collections: ; ; ; of Vibrio cholerae, along with , as part of the Vibriowatch project.
Firstly, we identified around 200 genomes which were erroneously included in the update on the , and should have been removed during the QC step. These genomes have been removed from the public collection, but will still be visible in collections that have already been created.
Visit the new collection via this link: .
We've made it simpler to access complete downloads of the public data sets, including FASTAs, metadata and computed annotations. For more information, please visit the documentation pages .
This update with 7,089 genomes means that Pathogenwatch now holds the largest collection of curated S. Typhi genome assemblies to date (11,998 assembled genomes in all). Assemblies and linked metadata were provided by the . Metadata provided includes date, country of isolation, country of origin inferred from with travel information, isolation source (i.e. blood, sputum), and purpose of sampling (targeted or non-targeted), along with links to related records in the ENA. For more information please see .
We are having an issue with getting the Campylobacter clustering to run properly since . We hope to have this resolved soon as no core changes have been made since the last time.
)
We have added 36,141 new genomes to the S. aureus public data set. This has necessitated removing subtrees from the S. aureus collections (). If you need to access one of these trees from an old collection, please as all data is still present in the database. All genomes are available for search via the cgMLST profile clustering.
In order to enable the expansion of the public genome database, unfortunately we will have to disable the generation of in S. aureus collections. Instead, in order to search the genomic neighbourhood of a query genome and create a collection, you can use the cluster search method combined with the "List Genomes" button in the.
We've a new version of that correctly differentiates the Shigella sonnei and E. coli. The previous version included mislabelled reference genomes causing a dispatate set of E. coli genomes to be identified as S. sonnei.
We have assembled 9,180 A. baumanii genomes, sourced from the , with sample location, and/or date attributes, and other metadata. These have been included in the public data set, and can be searched using the . The previous 20 have been removed since it was not possible to identify which run they were built from. They are still represented in the new data set.
We have also assembled an extra 16,556 K. pneumoniae genomes, to make it over 32,000 in the public data set. These have been included in the public data set, and can be searched using the . Thanks also to our collaborators for their curation work.
With the latest , collections of K. quasipneumoniae and K. variicola will have a neighbour-joining built using a core genome developed in collaboration with . All current collections of those species have been updated.
In conjunction with the release of , the 2,375 genomes of the EuroGASP 2018 structural survey have been added to the . You can also view them as an individual collection here: . The previous EuroGASP 2013 study can be found here: . Congratulations to all involved on an important piece of work.
: We have noticed that switching to the metadata tab in some collections is causing the collection view to crash. We hope to have this fixed shortly.
You can now assign literature references to genomes and collections using identifiers from the Digital Object Identification system (DOI), on top of the previous support for Pubmed identifiers. For more information, see the documentation and .