๐Announcements
News about upcoming changes or down time.
25th October
New cgMLST and MLST schemes.
We have added 13 new cgMLST schemes, covering 2 genera and 11 species, enabling clustering and context searching for these species. We have also added 1 new MLST scheme. For full details, please see the release notes here.
Downtime 25th October
Database and MLST/cgMLST update 9am-1pm (apprx)
Apologies for any inconvience in advance, but on Friday the 25th at 9am we will be switching off the website for several hours will we manage a database upgrade. At the same time we will also be updating all MLST and cgMLST assignments to the latest version. The task queues will be closed around an hour earlier to allow jobs to complete prior to shutdown.
3rd October
KlebNET K. pnuemoniae neonatal sepsis collection
As part of the KlebNET consortium we are now hosting the KlebNET curated collection of Klebsiella pneumoniae samples from neonatal sepsis cases. The collection has been manually curated by the KlebNET consortium to support research into the microbial genomics of neonatal sepsis, a leading cause of infant mortality in low and middle-income countries.
1st October
Addition of the EuroGASP collection of N. gonorrhoeae strains
We are pleased to announce that a collection of the genomes from the EuroGASP 2020 N. gonorrhoeae surveillance study is now publicaly available via this link. The genomes have also been included in the public set and added to cgMLST clustering searches.
11th September
Removal of N. gonorrhoeae duplicates from the public data set
We identified 7258 Neisseria gonorhoeae samples with duplicate assemblies as a result of a missed filtering step during the last update. The extra copies have been removed from the public databases, but will still be accessible for collections that already include them. We have also merged the metadata to provide more complete records for those strains, and fixed the Pubmed and DOI links for all N. gonorrhoeae.
25th July
FASTQ upload issues RESOLVED
We believer we identified the root cause of the issue as a networking change we had made on the 16th July, We have reverted this change and are not seeing any more failures. If you are still seeing your uploads blocked please contact us.
23rd July
FASTQ upload issues
We have had several reports that FASTQ uploads are often getting blocked and not proceeding. We are investigating the issue, and hope to have a resolution in place soon. We will update this announcement when the uploads are working fully again. If you wish to be notified directly please contact our support email.
8th July 2024
Downtime for server upgrade
At 9am on the 8th of July 2024 we will be taking down the website in order to upgrade the database server. This process may take several hours, please bear with us during this time. It has been some time since our last upgrade and our user base has been rapidly growing. It should significantly improve the site stability and its ability to handle larger groups of concurrent users.
Note that any genomes already queued for analysis will not be lost, and they will be processed once the website starts back up again.
20-21st January 2024
cgMLST profiles and clustering update
This weekend we will be updating the cgMLST profiles and cgMLST-based clustering for all Pathogenwatch genomes. This update brings our cgMLST assignment in line with the community standard set by PubMLST and should lead to improvements in the clustering. The site is expected to remain available at all times, but the processing queue may be closed for most of the weekend. We apologies in advance for any inconvenience.
3rd November
992 new Vibrio cholerae genomes from Vibriowatch
We are please to announce as part of the Vibriowatch project, we have added 992 more genomes to the public collection, making a total of 5,671 assembled genomes. We have also added a further 15 collections and updated one. Thanks to Avril Coghlan for manually curating and collating everything.
22nd September
cgMLST profiles and clustering update
This weekend we will be updating the cgMLST profiles and cgMLST-based clustering for all Pathogenwatch genomes. This update brings our cgMLST assignment in line with the community standard set by PubMLST and should lead to improvements in the clustering. The site is expected to remain available at all times, but the processing queue may be closed for most of the weekend. We apologies in advance for any inconvenience.
22nd September
1662 new public Candida auris genomes.
The C. auris public genome collection has more than doubled in size to a total of 2,680 assembled genomes. New genomes with Illumina paired end reads and either location or time metadata were downloaded from the ENA on the 21st of June and assembled at the CGPS.
All C. auris collections have been updated to add the new public genomes to the subtrees.
19th September
Updates to Vibrio cholerae.
We have released four new collections: Alam et al (2022); Angemeyer et al (2022); Irenge et al (2020); Wang et al (2020) of Vibrio cholerae, along with 57 newly public genomes, as part of the Vibriowatch project.
20th June
Updates to Klebsiella species
Firstly, we identified around 200 genomes which were erroneously included in the update on the 26th April, and should have been removed during the QC step. These genomes have been removed from the public collection, but will still be visible in collections that have already been created.
Secondly, we have assembled and made available another 800 genomes that published by the ENA from April 2022 to March 2023, along with linked metadata including sample date and location.
As a result, there are now 12 Klebsiella africana, 1306 Klebsiella quasipneumoniae, 30 Klebsiella quasivariicola, and 1270 Klebsiella variicola assemblies available on Pathogenwatch.
9th June
Disruption due to update
We expect to deploy an update to the Candida auris core scheme on either Friday 4-5pm or Monday 4-5pm. This may cause some brief disruption while C. auris collections are rebuilt, and may mean some of these will not display correctly until this has been completed.
18th May
271 new public Salmonella Typhi genomes from South Africa
Thanks to our KlebNET collaboration, we have a new collection of 271 Salmonella Typhi assemblies with detailed curated annotation provided by the National Institute of Communicable Disease of South Africa (NICD).
Visit the new collection via this link: https://pathogen.watch/collection/iub5by3x15ba-south-africa-nicd-typhi.
10th May
Queues will be closed for 2 hours from 3pm (UK)
We have a major update to the genome clustering process today that will necessitate closing the processing queues for up to 2 hours. The website should remain available throughout this time, though accessing clusters from genome reports may not be possible.
9th May
Downloading the public data is now easier
We've made it simpler to access complete downloads of the public data sets, including FASTAs, metadata and computed annotations. For more information, please visit the documentation pages "Public data downloads".
3rd May
N. gonorrhoeae annotation fix
We identified 32 public genomes that didn't have their name fields set. These are now fixed and should correctly appear in searches and collections.
26th April
More Klebsiella genomes added from the Kpn-complex
We have added 1,025 new Klebsiella genomes from the Kpn-complex to Pathogenwatch. The new additions include 578 K. variicola, 435 K. quasipneumoniae, 8 K. quasivariicola, and 4 K. africans.
21st April
Two new collections of Salmonella Typhi added
Thanks to our collaborators in TyphiNET we have 749 new genomes with detailed annotation based on two recent surveillance studies. Follow the links below to view them.
7th March
Phylogentic trees, population search and AMR released for Candida auris
We are pleased to announce official support C. auris as the first fungal pathogen in Pathogenwatch. In collaboration with Matthew Fisher and Johanna Rhodes, we have created an in-house core scheme, allowing the generation of phylogenetic trees, along with a resistance genotyping scheme. We have also included 1018 public genomes with sample date and locaiton metadata, as well as the five complete reference genomes. These genomes will be included in the population subtrees in C. auris collections.
10th February
4,738 V. cholerae genomes now available
The Vibriowatch consortium have added a further 4,272 manually selected and annotated Vibrio cholerae assembled genomes to the public collection. Thanks especially to Avril Coghlan from Nick Thompson's team.
8th February
Latitude-longitude coordinates corrected for public N. gonorrhoeae
We have corrected a large number of inferred coordinates in the new set of N. gonorrhoeae public genomes (released 23rd November). In all cases the location reported in the metadata was correct. However, due to the polymorphic nature of the location descriptions, the geocoder we use to infer the lat-long coordinates was severely confused. The locations have been manually reviewed and fixed, and we expect now full agreement between the stated country and inferred coordinates.
31st January
7089 Salmonella Typhi genomes and 29 collections added to public dataset
This update with 7,089 genomes means that Pathogenwatch now holds the largest collection of curated S. Typhi genome assemblies to date (11,998 assembled genomes in all). Assemblies and linked metadata were provided by the Global Typhoid Genomics Consortium. Metadata provided includes date, country of isolation, country of origin inferred from with travel information, isolation source (i.e. blood, sputum), and purpose of sampling (targeted or non-targeted), along with links to related records in the ENA. For more information please see https://www.medrxiv.org/content/10.1101/2022.12.28.22283969v.
30th January
Salmonella Typhi update news
We are expecting to remove subtrees and update the public Salmonella Typhi public genome database tomorrow.
12th January
Salmonella Typhi subtrees to be removed
In order to allow the expansion of the public Typhi genome database we have to remove the "subtrees" generated for Typhi collections. Instead, we recommend using the clustering tool to identify genomic neighbours in the public and your uploaded genomes. We expect to remove this functionality early next week.
11th January
Campylobacter clusters not yet available.
We are having an issue with getting the Campylobacter clustering to run properly since the recent cgMLST update. We hope to have this resolved soon as no core changes have been made since the last time.
Note: This issue was resolved as of ~5pm UK time.
9th January 2023
Uploads temporarily suspended.
Unfortunately a release failed on Friday evening and we have to rerun it this morning. It will take a few hours as there are a lot of results to process, during which uploads, tree building and clustering will be suspended. We apologies for any inconvenience and any issues you may have experienced since Friday.
Announcements from 2022 below.
20th December
V. cholerae public genomes updated
We have deployed the first 449 genome slice of a curated collection of Vibrio cholerae assemblies from the Vibriowatch project. As part of this update we have removed the old public genome set, most of which should be replaced as part of the future updates. We have also added two new public collections based on the papers these genomes are described in.
12th December
New S. aureus genomes available
We have added 36,141 new genomes to the S. aureus public data set. This has necessitated removing subtrees from the S. aureus collections (see below). If you need to access one of these trees from an old collection, please contact us as all data is still present in the database. All genomes are available for search via the cgMLST profile clustering.
S. aureus removal of duplicate genomes
While preparing an update of over 35,000 new genomes we identified 146 duplicate samples in the current public data set. All of the duplicates were not included any public collections, so they have been simply removed from the public data. The records still exist, so any private collections that contain them will be still load correctly.
Notification of removal of support for S. aureus subtrees
In order to enable the expansion of the public genome database, unfortunately we will have to disable the generation of reference subtrees in S. aureus collections. Instead, in order to search the genomic neighbourhood of a query genome and create a collection, you can use the cluster search method combined with the "List Genomes" button in the Genome Reports.
6th December
Shigella sonnei/E. coli species assignment correction
We've released a new version of Speciator that correctly differentiates the Shigella sonnei and E. coli. The previous version included mislabelled reference genomes causing a dispatate set of E. coli genomes to be identified as S. sonnei.
We are in the process of updating all user genomes to the correct species. We expect this to be complete by the 7th December. The update should have zero impact on any other assignments. Please note, as of the 2nd December update, all the public E. coli, S. sonnei and S. flexneri are correctly identified.
Before correction
After
2nd December
New Shigella sonnei
The 556 previous S. sonnei records have been removed and replaced with 13,211 new genomes with location and sample date have been sourced from ENA. 424 of the previous records are included in the new set, while the rest have been removed due to lack of quality control assessment or species assignment.
E. coli/S. sonnei speciation issue
We are aware that are small percentage (<3%) of E. coli genomes are incorrectly assigned as S. sonnei genomes. We have a new version of speciator that fixes this in final testing and expect to deploy it next week. Previously incorrectly identified genomes will also be corrected and updated.
28th November
New Shigella flexneri genomes available
The previously unrepresented species of S. flexneri now had 9,435 genomes available in the public dataset. These can be searched using the E. coli cgMLST scheme clusters.
New Streptococcus pneumoniae genomes available
14,285 new S. pneumoniae samples have been added to the public data set. We identified 3,671 duplicate samples within the public data, which have been removed. We also removed 701 records that we were unable to verify met our quality control standards. We also replaced the representative assemblies for 1,332 of the previous public data set. In total, there are now 35,604 S. pneumoniae samples represented.
23rd November
New Helicobacter pylori genomes available
There are now 1,038 H. pylori genomes in the public data set - a species that was previously completely unrepresented.
New Enterobacter genomes available
Pathogenwatch now contains 5,699 Enterobacter genomes. 5,667 new genomes were added, replacing 306 of the previous genomes with improved assemblies, while 6 were removed for failing to pass QC.
New Neisseria gonorrhoeae genomes available
The latest data update contains 23,063 new N. gonorrhoeae genomes, bring the total to 38,367.
17th November
New Campylobacter genomes available
Today we have added 21,379 C. coli genomes, and 51,187 C. jejuni genomes. The former's previous 47 public genomes, and the latter's 330 have been removed from the public database. All bar 12 C. jejuni and 2 C. coli genomes are represented in the new public data set. These 14 failed our updated QC thresholds.
14th November
New Haemophilus influenzae genomes available.
The latest update contains 3715 new H. influenzae genomes. The 32 previous genomes were removed.
New Enterococcus faecium genomes available
The latest update contains 15,553 new E. faecium genomes sourced from the ENA. Previously no genomes were available to search via cgMLST.
New Pseudomonas aeruginosa genomes available
As part of the ongoing update to the Pathongenwatch public database, we have assembled and added 13,400 new P. aeruginosa genomes from the ENA. Previously there was one genome, which has been removed.
11th November
New Acinetobacter baumanii genomes available
We have assembled 9,180 A. baumanii genomes, sourced from the ENA, with sample location, and/or date attributes, and other metadata. These have been included in the public data set, and can be searched using the cgMLST clustering. The previous 20 have been removed since it was not possible to identify which run they were built from. They are still represented in the new data set.
New Klebsiella pneumoniae genomes available
We have also assembled an extra 16,556 K. pneumoniae genomes, to make it over 32,000 in the public data set. These have been included in the public data set, and can be searched using the cgMLST clustering. Thanks also to our KlebNet collaborators for their curation work.
14th October
Database slowdown during update
We are planning to deploy an update today that may have some impact on the database performance, and specifically browsing the complete genome list, while it is ongoing. It should take less than an hour, and won't affect processing genomes or viewing collections.
3rd October
Release complete
Apologies for the disruption on the 29th-31st. This was due to two reasons:
The update process was much slower than anticipated.
The initial fix for cgMLST clustering speeds caused an issue with other complex queries to the database, primarily breaking the Genome List view.
Everything should now be working at least as well as before, and cgMLST clustering is a lot quicker and more robust. Please let us know if there's anything we've missed.
29th September
Release day!
The update scheduled for today is a bit of ahead of schedule and is going to begin this morning (UK am). The update should be complete by the end of the day, and we expect the site to remain browse-able, except for a couple of minutes. Please watch the announcements page and release notes for more information.
27th September
Downtime for update expected 29th September
We are currently preparing an update of all genomes to the latest MLST and cgMLST assignments. We expect the update to be ready for this Thursday afternoon (UK time). While we don't expect the website to be inaccessible for more than a minute or two, it is likely that updating the genomes will take a couple of hours and no user tasks will be run during this time - so genome uploads, tree building and clustering will be unavailable.
This update will also include a potential fix the performance issues currently being observed for cgMLST clustering. If there are unexpected issues due to this change, there may also be some more brief disruption as it is reverted.
13th September
Server downtime scheduled for 5pm (UK)
Unfortunately we're still having some issues affecting genome clustering, and we haven't yet identified the root cause. As a result the service will be going down for an estimated 30 minutes at 5pm UK time today. We apologise in advance for any inconvenience.
7th September
Ongoing task processing issues
For an unknown reason, clustering tasks are freezing and causing the task queues to freeze as well. This does not appear to be related to any changes we have made, and could be the result of a 3rd party service provider, but the investigation is ongoing. We apologise for any inconvenience during this time.
26th July
Reduced support in August
Due to staff absences for August holidays, there is likely to be reduced support for answering questions and fixing problems. The site itself is pretty robust and "self-healing" so we don't anticipate any downtime during this period. Please do still get in touch if you do find an issue or want to know something, it may just take a few days before there is a response.
15th July
Public Salmonella Typhi metadata updated
The metadata for the public collection of S. Typhi has been cleaned up and extended with extra fields from the ENA.
17th June
Klebsiella quasipneumoniae and K. variicola trees added
With the latest update, collections of K. quasipneumoniae and K. variicola will have a neighbour-joining tree built using a core genome developed in collaboration with KlebNet. All current collections of those species have been updated.
11th May
EuroGASP 2018 Neisseria gonorrhoeae genomes added
In conjunction with the release of "Europe-wide expansion and eradication of multidrug-resistant Neisseria gonorrhoeae lineages: a genomic surveillance study" by Leonor Sรกnchez-Busรณ et al. (The Lancet 2022), the 2,375 genomes of the EuroGASP 2018 structural survey have been added to the Pathogenwatch public genomes. You can also view them as an individual collection here: https://pathogen.watch/collection/eurogasp2018. The previous EuroGASP 2013 study can be found here: https://pathogen.watch/collection/eurogasp2013. Congratulations to all involved on an important piece of work.
5th May
Website issue
Resolved: We have noticed that switching to the metadata tab in some collections is causing the collection view to crash. We hope to have this fixed shortly.
28th April
New feature - DOI references
You can now assign literature references to genomes and collections using identifiers from the Digital Object Identification system (DOI), on top of the previous support for Pubmed identifiers. For more information, see the documentation on genome uploads and creating collections.
21st April
Processing Delays
Yesterday we had the convergence of a lot of uploads combined with a configuration error that prevented the scaling up of our compute systems. Apologies for any delays and issues with running tasks. Everything should now be processed.
13th-20th April
Reduced support
Due to holidays in the UK, there will be minimal engineering support for Pathogenwatch. While we don't expect any issues, and the system is pretty good at recovering itself, please be aware that in the event of downtime there may be some delay in bringing it back.
5th-6th April
Upload and processing delays
We are preparing a large data set for future inclusion in Pathogenwatch. Due to time constraints this is likely to have some impact on the speed of processing external tasks over the next 24 hours. We apologise for any disruption and hope to keep it to a minimum.
30th March
About yesterday's downtime and upgrade
Apologies for the extended period that Pathogenwatch was unavailable. There were a couple of mistakes made with regards the robustness of the upgrade process, which we will be correcting for future infrastructure updates. However, the major issue was beyond our control - the developer doing the upgrade (me) lost power and internet (including mobile) for 5 hours, along with a significant chunk of east London, at a critical moment. We hope no one's work was too disrupted
The good news is the database server has been significantly upgraded and the genome list view should now load considerably more reliably.
Last updated