Uploading Genomes
Description of the upload page and file formats.
Free of charge, we provide the ability to analysis large numbers of new microbial assemblies in Pathogenwatch. To upload your own microbial pathogen assembly data, click on the 'Uploads' tab on the top right and follow the onscreen instructions.You will have the choice to upload three types of file:
- 1.One or more FASTAs each containing a single genome (i.e. bacterial genomes);
- 2.One or more FASTAs containing one genome per record (e.g. a FASTA of multiple viral genomes);
- 3.Pairs of read files in FASTQ format.

Select the type of files you wish to upload

Drag and drop or select the "+" button to upload files
Sequences must be represented in standard IUPAC code (i.e.
ATCGATCGNA
). Each record represents a single contig in the assembly. The file name is used to name the genome by default and to link to a record in an accompanying metadata CSV. More than one can be uploaded at a time, though we recommend small batches on slow or unstable internet connections.Sequences must be represented in standard IUPAC code (i.e.
ATCGATCGNA
). Each record represents an assembled or complete viral genome. The record header will be used to name each genome by default and to link to records in an accompanying metadata CSV. More than one can be uploaded at a time, though we request users to be mindful not to submit thousands of genomes at once as it will impact other users.You can also upload a limited number of pairs of FASTQ files for assembly using our in-house assembly pipeline. The default genome name is taken from the shared part of the filename of the FASTQs. For more details about this pipeline, please see the technical documentation.
Metadata files are accepted in CSV format, with a
.csv
file ending.- One row per assembly.
- Rows are linked to the FASTA file by column titled
filename
. For mutli-genome FASTAs (e.g. viral genomes) put the identifier in each record header in this column. - Provide a default name for an assembly with the column
displayname
. - Geographical location is provided by columns titled
latitude
andlongitude
. - Sample timestamps are recorded as three separate columns:
year
,month
,day
. - Literature references can be provided as DOI system identifiers (e.g. ) or Pubmed identifiers (e.g. ) in a column called
literaturelink
or in two columns calleddoi
andpmid
respectively. If a column calledliteraturelink
is provided, any columns calleddoi
orpmid
will be added to general user metadata instead and otherwise ignored.
We strongly recommend including at least when and where the sample was taken.
This will compress the files prior to upload. On a fast connection this will have little impact, and may slow it down, but it can significantly improve upload times on a slower connection.
If your connection regularly disconnects, then this will increase the chance that each file will be uploaded successfully. Compression should help as well in this case.
Select these options prior to dropping your files onto the page.
The tasks being carried out, and their individual progress are tracked in the bottom left corner. The overall progress and current stage is tracked on the top right, and indicated when complete.
Once all tasks are complete, you can press the "View Genomes" button to view the results in your "Genomes" page. Individual uploads are tagged and listed in the bottom left corner.

A Succesful Upload
Last modified 10mo ago