The GPS Pipeline is a Nextflow pipeline designed for processing raw reads (FASTQ files) of Streptococcus pneumoniae samples. After preprocessing, the pipeline performs initial assessment based on the total bases in reads. Passed samples will be further assess based on assembly, mapping, and taxonomy. If the sample passes all quality controls (QC), the pipeline also provides the sample's serotype, multi-locus sequence typing (MLST), lineage (based on the Global Pneumococcal Sequence Cluster (GPSC)), and antimicrobial resistance (AMR) against multiple antimicrobials.
The pipeline is designed to be easy to set up and use, and is suitable for use on local machines and high-performance computing (HPC) clusters alike. Additionally, the pipeline only downloads essential files to enable the analysis, and no data is uploaded from the local environment, making it an ideal option for cases where the FASTQ files being analysed is confidential. After initialisation or the first successful complete run, the pipeline can be used offline unless you have changed the selection of any database or container image.
The development of this pipeline is part of the GPS Project (Global Pneumococcal Sequencing Project).
If you have used the GPS Pipeline in your research, please cite us in your relevant publications:
Harry C. H. Hung, Narender Kumar, Victoria Dyster, Corin Yeats, Benjamin Metcalf, Yuan Li, Paulina A. Hawkins, Lesley McGee, Stephen D. Bentley, and Stephanie W. Lo. A Portable and Scalable Genomic Analysis Pipeline for Streptococcus pneumoniae Surveillance: GPS Pipeline. bioRxiv 2024.11.27.625679 [Preprint]. doi: 10.1101/2024.11.27.625679
Note
A Quickstart Guide is available here. Still, we highly recommend reading the Usage, Pipeline Options, and Output sections for a comprehensive understanding.
- A POSIX-compatible operating system (e.g. Linux, macOS, Windows with WSL) with Bash 3.2 or later
- Installation guide for WSL on Windows by Microsoft
- Java 11 or later (up to 22) (OpenJDK/Oracle Java)
- Installation guide for OpenJDK by freeCodeCamp
- Docker or Singularity/Apptainer
- Installation guides:
- For Linux
- Docker Engine on Linux by Docker (must install
docker-compose-plugin
as per the guide) - Apptainer on Linux by Apptainer
- (Not recommended) Docker Desktop for Linux, it is known to cause permission issues on Linux, which could prevent the pipeline from working
- Docker Engine on Linux by Docker (must install
- For macOS
- Docker Desktop on macOS by Docker
- need to allow Docker to access enough system resources, especially CPU and Memory
- Docker Desktop on macOS by Docker
- For Linux
- For Windows with WSL
- Docker Desktop on Windows with WSL by Docker
- Installation guides:
It is recommended to have at least 16GB of RAM and 100GB of free storage
Note
- The pipeline core files use ~5MB
- All default databases use ~19GB in total
- All Docker images use ~13GB in total; alternatively, Singularity images use ~4.5GB in total
- The pipeline generates ~1.8GB intermediate files for each sample on average
- These files can be removed when the pipeline run is completed, please refer to Clean Up
- To further reduce storage requirement by sacrificing the ability to resume the pipeline, please refer to Experimental
- Only Illumina paired-end short reads are supported
- Each sample is expected to be a pair of raw reads following this file name pattern:
*_{,R}{1,2}{,_001}.{fq,fastq}{,.gz}
- example 1:
SampleName_R1_001.fastq.gz
,SampleName_R2_001.fastq.gz
- example 2:
SampleName_1.fastq.gz
,SampleName_2.fastq.gz
- example 3:
SampleName_R1.fq
,SampleName_R2.fq
- example 1:
Warning
- Docker or Singularity must be running
- An Internet connection is required
-
Clone the repository (if Git is installed on your system)
git clone https://github.com/GlobalPneumoSeq/gps-pipeline.git
or
Download and unzip/extract the latest release
-
Go into the local directory of the pipeline and it is ready to use without installation (the directory name might be different)
cd gps-pipeline
-
(Optional) You could perform an initialisation to download all required additional files and container images, so the pipeline can be used at any time with or without the Internet afterwards.
- Using Docker as the container engine
./run_pipeline --init
- Using Singularity as the container engine
./run_pipeline --init -profile singularity
- Using Docker as the container engine
Warning
- Docker or Singularity must be running
- If this is the first run and initialisation was not performed, an Internet connection is required
Note
By default, Docker is used as the container engine and all the processes are executed by the local machine. See Profile for details on running the pipeline with Singularity or on a HPC cluster
- You can run the pipeline without options. It will attempt to get the raw reads from the default location (i.e.
input
directory inside thegps-pipeline
local directory)./run_pipeline
- You can also specify the location of the raw reads by adding the
--reads
option./run_pipeline --reads /path/to/raw-reads-directory
- For a test run, you could obtain a small test dataset by running the included
download_test_input
script. The dataset will be saved to thetest_input
directory inside the pipeline local directory. You can then run the pipeline on the test data./download_test_input ./run_pipeline --reads test_input
9870_5#52
will fail the Taxonomy QC and hence Overall QC, therefore without analysis results17175_7#59
and21127_1#156
should pass Overall QC, therefore with analysis results
Tip
-profile
is a built-in Nextflow option, it only has one leading -
- By default, Docker is used as the container engine and all the processes are executed by the local machine. To change this, you could use Nextflow's built-in
-profile
option to switch to other available profiles./run_pipeline -profile [profile name]
- Available profiles:
Profile Name Details standard
(Default)Docker is used as the container engine.
Processes are executed locally.singularity
Singularity is used as the container engine.
Processes are executed locally.lsf
The pipeline should be launched from a LSF cluster head node with this profile.
Singularity is used as the container engine.
Processes are submitted to your LSF cluster viabsub
by the pipeline.
(Tested on Wellcome Sanger Institute farm5 LSF cluster only)
(Option--kraken2_memory_mapping
default change tofalse
.)
Tip
-resume
is a built-in Nextflow option, it only has one leading -
- If the pipeline is interrupted mid-run, Nextflow's built-in
-resume
option can be used to resume the pipeline execution instead of starting from scratch again - You should use the same command of the original run, only add
-resume
at the end (i.e. all pipeline options should be identical)- If the original command is
./run_pipeline --reads /path/to/raw-reads-directory
- The command to resume the pipeline execution should be
./run_pipeline --reads /path/to/raw-reads-directory -resume
- If the original command is
- During the run of the pipeline, Nextflow generates a considerable amount of intermediate files
- If the run has been completed and you do not intend to use the
-resume
option or those intermediate files, you can remove the intermediate files using one of the following ways:- Run the included
clean_pipeline
script- It runs the commands in manual removal for you
- It removes the
work
directory and log files within thegps-pipeline
local directory
./clean_pipeline
- Manual removal
- Remove the
work
directory and log files within thegps-pipeline
local directory
rm -rf work rm -rf .nextflow.log*
- Remove the
- Run
nextflow clean
command- This built-in command cleans up cache and work directories
- By default, it only cleans up the latest run
- For details and available options of
nextflow clean
, refer to the Nextflow documentation
./nextflow clean
- Run the included
The pipeline is compatible with Launchpad of Seqera Platform (previously known as Nextflow Tower) and Nextflow -with-tower
option. For more information, please refer to the Seqera Platform documentation.
- The tables below contain the available options that can be used when you run the pipeline
- Usage:
./run_pipeline [option] [value]
Tip
- To permanently change the value of an option, edit the
nextflow.config
file inside thegps-pipeline
local directory. $projectDir
is a Nextflow built-in implicit variables, it is defined as the local directory ofgps-pipeline
.- Pipeline options are not built-in Nextflow options, they are lead with
--
instead of-
Option | Values | Description |
---|---|---|
--init |
true or false (Default: false ) |
Use alternative workflow for initialisation, which means downloading all required additional files and container images, and creating databases. Can be enabled by including --init without value. |
--version |
true or false (Default: false ) |
Use alternative workflow for showing versions of pipeline, container images, tools and databases. Can be enabled by including --version without value.(This workflow pulls the required container images if they are not yet available locally) |
--help |
true or false (Default: false ) |
Show help message. Can be enabled by including --help without value. |
Warning
--output
overwrites existing results in the target directory if there is any--db
does not accept user provided local databases, directory content will be overwritten
Option | Values | Description |
---|---|---|
--reads |
Any valid path (Default: "$projectDir/input" ) |
Path to the input directory that contains the reads to be processed. |
--output |
Any valid path (Default: "$projectDir/output" ) |
Path to the output directory that save the results. |
--db |
Any valid path (Default: "$projectDir/databases" ) |
Path to the directory saving databases used by the pipeline. |
--assembly_publish |
"link" or "symlink" or "copy" (Default: "link" ) |
Method used by Nextflow to publish the generated assemblies. (The default setting "link" means hard link, therefore will fail if the output directory is set to outside of the working file system) |
Note
- Read QC does not have directly accessible parameters
- The minimum base count in reads of Read QC is based on the multiplication of
--length_low
and--depth
of Assembly QC (i.e. default value is38000000
)
Option | Values | Description |
---|---|---|
--spneumo_percentage |
Any integer or float value (Default: 60.00 ) |
Minimum S. pneumoniae percentage in reads to pass Taxonomy QC. |
--non_strep_percentage |
Any integer or float value (Default: 2.00 ) |
Maximum non-Streptococcus genus percentage in reads to pass Taxonomy QC. |
--ref_coverage |
Any integer or float value (Default: 60.00 ) |
Minimum reference coverage percentage by the reads to pass Mapping QC. |
--het_snp_site |
Any integer value (Default: 220 ) |
Maximum non-cluster heterozygous SNP (Het-SNP) site count to pass Mapping QC. |
--contigs |
Any integer value (Default: 500 ) |
Maximum contig count in assembly to pass Assembly QC. |
--length_low |
Any integer value (Default: 1900000 ) |
Minimum assembly length to pass Assembly QC. |
--length_high |
Any integer value (Default: 2300000 ) |
Maximum assembly length to pass Assembly QC. |
--depth |
Any integer or float value (Default: 20.00 ) |
Minimum sequencing depth to pass Assembly QC. |
Tip
- The output of SPAdes-based assembler is deterministic for a given count of threads
- Using
--assembler_thread
with a specific value can guarantee the generated assemblies will be reproducible for others using the same value
Option | Values | Description |
---|---|---|
--assembler |
"shovill" or "unicycler" (Default: "shovill" ) |
Using which SPAdes-based assembler to assemble the reads. |
--assembler_thread |
Any integer value (Default: 0 ) |
Number of threads used by the assembler. 0 means all available. |
--min_contig_length |
Any integer value (Default: 500 ) |
Minimum legnth of contig to be included in the assembly. |
Option | Values | Description |
---|---|---|
--ref_genome |
Any valid path to a .fa or .fasta file(Default: "$projectDir/data/ATCC_700669_v1.fa" ) |
Path to the reference genome for mapping. |
Option | Values | Description |
---|---|---|
--kraken2_db_remote |
Any valid URL to a Kraken2 database in .tar.gz or .tgz format(Default: Minikraken v1) |
URL to a Kraken2 database. |
--kraken2_memory_mapping |
true or false (Default: true ) |
Using the memory mapping option of Kraken2 or not.true means not loading the database into RAM, suitable for memory-limited or fast storage environments. |
Option | Values | Description |
---|---|---|
--seroba_db_remote |
Any valid URL to a SeroBA release in .tar.gz or .tgz format(Default: SeroBA v2.0.4) |
URL to a SeroBA release. |
--seroba_kmer |
Any integer value (Default: 71 ) |
Kmer size for creating the KMC database of SeroBA. |
Option | Values | Description |
---|---|---|
--poppunk_db_remote |
Any valid URL to a PopPUNK database in .tar.gz or .tgz format(Default: GPS v9) |
URL to a PopPUNK database. |
--poppunk_ext_remote |
Any valid URL to a PopPUNK external clusters file in .csv format(Default: GPS v9 GPSC Designation) |
URL to a PopPUNK external clusters file. |
Option | Values | Description |
---|---|---|
--ariba_ref |
Any valid path to a .fa or .fasta file(Default: "$projectDir/data/ariba_ref_sequences.fasta" ) |
Path to the reference sequences for preparing ARIBA database. |
--ariba_metadata |
Any valid path to a tsv file(Default: "$projectDir/data/ariba_metadata.tsv" ) |
Path to the metadata file for preparing ARIBA database. |
--resistance_to_mic |
Any valid path to a tsv file(Default: "$projectDir/data/resistance_to_MIC.tsv" ) |
Path to the resistance category to MIC (minimum inhibitory concentration) lookup table. |
Note
This section is only valid when Singularity is used as the container engine
Option | Values | Description |
---|---|---|
--singularity_cachedir |
Any valid path (Default: "$projectDir/singularity_cache" ) |
Path to the directory where Singularity images should be saved to. |
Option | Values | Description |
---|---|---|
--lite |
true or false (Default: false ) |
Reduce storage requirement by removing intermediate .sam and .bam files once they are no longer needed while the pipeline is still running.The quantity of reduction of storage requirement cannot be guaranteed. Can be enabled by including --lite without value. |
- By default, the pipeline outputs the results into the
output
directory inside thegps-pipeline
local directory - It can be changed by adding the option
--output
./run_pipeline --output /path/to/output-directory
The following directories and files are output into the output directory
Directory / File | Description |
---|---|
assemblies |
This directory contains all assemblies (.fasta ) generated by the pipeline |
results.csv |
This file contains all the information generated by the pipeline on each sample |
info.txt |
This file contains information regarding the pipeline and parameters of the run |
Note
- The output fields in
Other AMR
andVirulence
types depend on the provided ARIBA reference sequences and metadata file, and resistance category to MIC lookup table, the below table is based on the defaults. - The inferred Minimum Inhibitory Concentration (MIC) range of an antimicrobial in
Other AMR
type is only provided if it is included in the resistance category to MIC lookup table. The default lookup table is based on 2014 CLSI guidelines. - For resistance category:
S
= Sensitive/Susceptible;I
= Intermediate;R
= Resistant - For virulence genes:
POS
= Positive;NEG
= Negative
Tip
- If the
Overall_QC
result for a sample isREAD_ONE_CORRUPTED
,READ_TWO_CORRUPTED
, or both, the corresponding read file is found to be corrupted (e.g., an incomplete/damaged Gzip file or mismatches in read length and quality-score length). You may want to reacquire the read file from its source or discard the sample if the source file is also corrupted. - If the
Overall_QC
result for a sample isPREPROCESS MODULE FAILURE
,ASSEMBLY MODULE FAILURE
,MAPPING MODULE FAILURE
,TAXONOMY MODULE FAILURE
, or any combination of these, it indicates that a tool in the corresponding QC module crashed while processing the reads.- For
ASSEMBLY MODULE FAILURE
, you might be able to process the sample using another assembler.
- For
- If any in silico typing result for a sample is
MODULE FAILURE
, it means the corresponding tool crashed while attempting to process the sample.
The following fields can be found in the output results.csv
Field | Type | Description |
---|---|---|
Sample_ID |
Identification | Sample ID based on the raw reads file name |
Read_QC |
QC | Read quality control result |
Assembly_QC |
QC | Assembly quality control result |
Mapping_QC |
QC | Mapping quality control result |
Taxonomy_QC |
QC | Taxonomy quality control result |
Overall_QC |
QC | Overall quality control result (Based on Assembly_QC , Mapping_QC and Taxonomy_QC ) |
Bases |
Read | Number of bases in the reads (Default: ≥ 38 Mb to pass Read QC) |
Contigs# |
Assembly | Number of contigs in the assembly (Default: ≤ 500 to pass Assembly QC) |
Assembly_Length |
Assembly | Total length of the assembly (Default: 1.9 - 2.3 Mb to pass Assembly QC) |
Seq_Depth |
Assembly | Sequencing depth of the assembly (Default: ≥ 20x to pass Assembly QC) |
Ref_Cov_% |
Mapping | Percentage of reference covered by reads (Default: ≥ 60% to pass Mapping QC) |
Het-SNP# |
Mapping | Non-cluster heterozygous SNP (Het-SNP) site count (Default: ≤ 220 to pass Mapping QC) |
S.Pneumo_% |
Taxonomy | Percentage of reads assigned to Streptococcus pneumoniae (Default: ≥ 60% to pass Taxonomy QC) |
Top_Non-Strep_Genus |
Taxonomy | The most abundant non-Streptococcus genus in reads |
Top_Non-Strep_Genus_% |
Taxonomy | Percentage of reads assigned to the most abundant non-Streptococcus genus (Default: ≤ 2% to pass Taxonomy QC) |
GPSC |
Lineage | GPSC Lineage |
Serotype |
Serotype | Serotype |
ST |
MLST | Sequence Type (ST) |
aroE |
MLST | Allele ID of aroE |
gdh |
MLST | Allele ID of gdh |
gki |
MLST | Allele ID of gki |
recP |
MLST | Allele ID of recP |
spi |
MLST | Allele ID of spi |
xpt |
MLST | Allele ID of xpt |
ddl |
MLST | Allele ID of ddl |
pbp1a |
PBP AMR | Allele ID of pbp1a |
pbp2b |
PBP AMR | Allele ID of pbp2b |
pbp2x |
PBP AMR | Allele ID of pbp2x |
AMO_MIC |
PBP AMR | Estimated minimum inhibitory concentration (MIC) of amoxicillin (AMO) |
AMO_Res |
PBP AMR | Inferred resistance category against AMO |
CFT_MIC |
PBP AMR | Estimated MIC of ceftriaxone (CFT) |
CFT_Res(Meningital) |
PBP AMR | Inferred resistance category against CFT in meningital form |
CFT_Res(Non-meningital) |
PBP AMR | Inferred resistance category against CFT in non-meningital form |
TAX_MIC |
PBP AMR | Estimated MIC of cefotaxime (TAX) |
TAX_Res(Meningital) |
PBP AMR | Inferred resistance category against TAX in meningital form |
TAX_Res(Non-meningital) |
PBP AMR | Inferred resistance category against TAX in non-meningital form |
CFX_MIC |
PBP AMR | Estimated MIC of cefuroxime (CFX) |
CFX_Res |
PBP AMR | Inferred resistance category against CFX |
MER_MIC |
PBP AMR | Estimated MIC of meropenem (MER) |
MER_Res |
PBP AMR | Inferred resistance category against MER |
PEN_MIC |
PBP AMR | Estimated MIC of penicillin (PEN) |
PEN_Res(Meningital) |
PBP AMR | Inferred resistance category against PEN in meningital form |
PEN_Res(Non-meningital) |
PBP AMR | Inferred resistance category against PEN in non-meningital form |
CHL_MIC |
Other AMR | Inferred MIC of Chloramphenicol (CHL) |
CHL_Res |
Other AMR | Predicted resistance category against CHL |
CHL_Determinant |
Other AMR | Known determinants that predicted the CHL resistance category |
CLI_MIC |
Other AMR | Inferred MIC of Clindamycin (CLI) |
CLI_Res |
Other AMR | Predicted resistance category against CLI |
CLI_Determinant |
Other AMR | Known determinants that predicted the CLI resistance category |
COT_MIC |
Other AMR | Inferred MIC of Co-Trimoxazole (COT) |
COT_Res |
Other AMR | Predicted resistance category against COT |
COT_Determinant |
Other AMR | Known determinants that predicted the COT resistance category |
DOX_MIC |
Other AMR | Inferred MIC of Doxycycline (DOX) |
DOX_Res |
Other AMR | Predicted resistance category against DOX |
DOX_Determinant |
Other AMR | Known determinants that predicted the DOX resistance category |
ERY_MIC |
Other AMR | Inferred MIC of Erythromycin (ERY) |
ERY_Res |
Other AMR | Predicted resistance category against ERY |
ERY_Determinant |
Other AMR | Known determinants that predicted the ERY resistance category |
ERY_CLI_Res |
Other AMR | Predicted resistance category against Erythromycin (ERY) and Clindamycin (CLI) |
ERY_CLI_Determinant |
Other AMR | Known determinants that predicted the ERY and CLI resistance category |
FQ_Res |
Other AMR | Predicted resistance category against Fluoroquinolones (FQ) |
FQ_Determinant |
Other AMR | Known determinants that predicted the FQ resistance category |
KAN_Res |
Other AMR | Predicted resistance category against Kanamycin (KAN) |
KAN_Determinant |
Other AMR | Known determinants that predicted the KAN resistance category |
LFX_MIC |
Other AMR | Inferred MIC of Levofloxacin (LFX) |
LFX_Res |
Other AMR | Predicted resistance category against LFX |
LFX_Determinant |
Other AMR | Known determinants that predicted the LFX resistance category |
RIF_MIC |
Other AMR | Inferred MIC of Rifampin (RIF) |
RIF_Res |
Other AMR | Predicted resistance category against RIF |
RIF_Determinant |
Other AMR | Known determinants that predicted the RIF resistance category |
SMX_Res |
Other AMR | Predicted resistance category against Sulfamethoxazole (SMX) |
SMX_Determinant |
Other AMR | Known determinants that predicted the SMX resistance category |
TET_MIC |
Other AMR | Inferred MIC of Tetracycline (TET) |
TET_Res |
Other AMR | Predicted resistance category against TET |
TET_Determinant |
Other AMR | Known determinants that predicted the TET resistance category |
TMP_Res |
Other AMR | Predicted resistance category against Trimethoprim (TMP) |
TMP_Determinant |
Other AMR | Known determinants that predicted the TMP resistance category |
VAN_MIC |
Other AMR | Inferred MIC of Vancomycin (VAN) |
VAN_Res |
Other AMR | Predicted resistance category against VAN |
VAN_Determinant |
Other AMR | Known determinants that predicted the VAN resistance category |
PILI1 |
Virulence | Expression of PILI-1 |
PILI1_Determinant |
Virulence | Known determinants that predicted the PILI-1 expression |
PILI2 |
Virulence | Expression of PILI-2 |
PILI2_Determinant |
Virulence | Known determinants that predicted the PILI-2 expression |
This project uses open-source components. You can find the homepage or source code of their open-source projects along with license information below. I acknowledge and am grateful to these developers for their contributions to open source.
- ARIBA: rapid antimicrobial resistance genotyping directly from sequencing reads Hunt M, Mather AE, Sánchez-Busó L, Page AJ, Parkhill J , Keane JA, Harris SR. Microbial Genomics 2017. doi: 110.1099/mgen.0.000131
- License (GPL-3.0): https://github.com/sanger-pathogens/ariba/blob/master/LICENSE
- This tool is used in
GET_ARIBA_DB
andOTHER_RESISTANCE
processes of theamr.nf
module
- Twelve years of SAMtools and BCFtools. Petr Danecek, James K Bonfield, Jennifer Liddle, John Marshall, Valeriu Ohan, Martin O Pollard, Andrew Whitwham, Thomas Keane, Shane A McCarthy, Robert M Davies, Heng Li. GigaScience, Volume 10, Issue 2, February 2021, giab008, https://doi.org/10.1093/gigascience/giab008
- Licenses
- BCFtools (MIT/Expat or GPL-3.0): https://github.com/samtools/bcftools/blob/develop/LICENSE
- SAMtools (MIT/Expat): https://github.com/samtools/samtools/blob/develop/LICENSE
- These tools are used in
SAM_TO_SORTED_BAM
andSNP_CALL
processes of themapping.nf
module
- Li H. (2013) Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM. arXiv:1303.3997v2 [q-bio.GN]
- License (GPL-3.0): https://github.com/lh3/bwa/blob/master/COPYING
- This tool is used in
GET_REF_GENOME_BWA_DB
andMAPPING
processes of themapping.nf
module
Docker Images of ARIBA, BCFtools, BWA, fastp, Kraken 2, mlst, PopPUNK, QUAST, SAMtools, Shovill, Unicycler
- State Public Health Bioinformatics Workgroup (@StaPH-B)
- License (GPL-3.0): https://github.com/StaPH-B/docker-builds/blob/master/LICENSE
- These Docker images provide containerised environments with different bioinformatics tools for processes of multiple modules
Docker Image of network-multitool
- Wbitt - We Bring In Tomorrow's Technolgies (@WBITT)
- License (MIT): https://github.com/wbitt/Network-MultiTool/blob/master/LICENSE
- This Docker image provides the containerised environment with Bash tools for processes of multiple modules
- Alexander Mancevice (@amancevice)
- License (MIT): https://github.com/amancevice/docker-pandas/blob/main/LICENSE
- This Docker image provides the containerised environment with Python and Pandas for
GENERATE_OVERALL_REPORT
process of theoutput.nf
module,HET_SNP_COUNT
process of themapping.nf
module andPARSE_OTHER_RESISTANCE
process of theamr.nf
module
- Shifu Chen, Yanqing Zhou, Yaru Chen, Jia Gu; fastp: an ultra-fast all-in-one FASTQ preprocessor, Bioinformatics, Volume 34, Issue 17, 1 September 2018, Pages i884-i890, https://doi.org/10.1093/bioinformatics/bty560
- License (MIT): https://github.com/OpenGene/fastp/blob/master/LICENSE
- This tool is used in
PREPROCESS
process of thepreprocess.nf
module
- Victoria Dyster (@blue-moon22)
- License (GPL-3.0): https://github.com/sanger-bentley-group/GPSC_pipeline_nf/blob/master/LICENSE
- Code adapted into the
get_lineage.sh
script
- Wood, D.E., Lu, J. & Langmead, B. Improved metagenomic analysis with Kraken 2. Genome Biol 20, 257 (2019). https://doi.org/10.1186/s13059-019-1891-0
- License (MIT): https://github.com/DerrickWood/kraken2/blob/master/LICENSE
- This tool is used in
TAXONOMY
process of thetaxonomy.nf
module
- Narender Kumar (@kumarnaren)
- License (GPL-3.0): https://github.com/kumarnaren/mecA-HetSites-calculator/blob/master/LICENSE
- Code was rewritten into the
het_snp_count.py
script
- Torsten Seemann (@tseemann)
- License (GPL-2.0): https://github.com/tseemann/mlst/blob/master/LICENSE
- Incorporates components of the PubMLST database
- This tool is used in
MLST
process of themlst.nf
module
- P. Di Tommaso, et al. Nextflow enables reproducible computational workflows. Nature Biotechnology 35, 316-319 (2017) doi:10.1038/nbt.3820
- License (Apache 2.0): https://github.com/nextflow-io/nextflow/blob/master/COPYING
- This project is a Nextflow pipeline; Nextflow executable
nextflow
is included in this repository
- Lees JA, Harris SR, Tonkin-Hill G, Gladstone RA, Lo SW, Weiser JN, Corander J, Bentley SD, Croucher NJ. Fast and flexible bacterial genomic epidemiology with PopPUNK. Genome Research 29:1-13 (2019). doi:10.1101/gr.241455.118
- License (Apache 2.0): https://github.com/bacpop/PopPUNK/blob/master/LICENSE
- This tool is used in
LINEAGE
process of thelineage.nf
module
- Alla Mikheenko, Andrey Prjibelski, Vladislav Saveliev, Dmitry Antipov, Alexey Gurevich, Versatile genome assembly evaluation with QUAST-LG, Bioinformatics (2018) 34 (13): i142-i150. doi: 10.1093/bioinformatics/bty266. First published online: June 27, 2018
- License (GPL-2.0): https://github.com/ablab/quast/blob/master/LICENSE.txt
- This tool is used in
ASSEMBLY_ASSESS
process of theassembly.nf
module
- SeroBA: rapid high-throughput serotyping of Streptococcus pneumoniae from whole genome sequence data. Epping L, van Tonder, AJ, Gladstone RA, GPS Consortium, Bentley SD, Page AJ, Keane JA, Microbial Genomics 2018, doi: 10.1099/mgen.0.000186
- License (GPL-3.0): https://github.com/sanger-pathogens/seroba/blob/master/LICENSE
- This project uses a Docker image of a fork
- The fork provides SeroBA with the latest updates as the original repository is no longer maintained
- The Docker image provides the containerised environment with SeroBA for
GET_SEROBA_DB
andSEROTYPE
processes of theserotype.nf
module
- Narender Kumar (@kumarnaren)
- License (GPL-3.0): https://github.com/kumarnaren/resistanceDatabase/blob/main/LICENSE
sequences.fasta
is renamed toariba_ref_sequences.fasta
and modifiedmetadata.tsv
is renamed toariba_metadata.tsv
and modified- The files are used as the default inputs of
GET_ARIBA_DB
process of theamr.nf
module
- Torsten Seemann (@tseemann)
- License (GPL-3.0): https://github.com/tseemann/shovill/blob/master/LICENSE
- This tool is used in
ASSEMBLY_SHOVILL
process of theassembly.nf
module
SPN-PBP-AMR (CDC PBP AMR Predictor)
- Pathogenwatch (@pathogenwatch-oss)
- License (MIT): https://github.com/pathogenwatch-oss/spn-resistance-pbp/blob/main/LICENSE
- This is a modified version of AMR predictor by Ben Metcalf (@BenJamesMetcalf) at the Centre for Disease Control (CDC)
- This project uses a Docker image of a fork
- The fork changes the Docker image from a Docker executable image to a Docker environment for Nextflow integration
- The Docker image provides the containerised environment with SPN-PBP-AMR for
PBP_RESISTANCE
process of theamr.nf
module
- Wick RR, Judd LM, Gorrie CL, Holt KE. Unicycler: resolving bacterial genome assemblies from short and long sequencing reads. PLoS Comput Biol 2017.
- License (GPL-3.0): https://github.com/rrwick/Unicycler/blob/main/LICENSE
- This tool is used in
ASSEMBLY_UNICYCLER
process of theassembly.nf
module