This repository contains the RASE pipeline for rapid inference of antibiotic resistance and susceptibility using genomic neighbor typing using RASE. Other components of the software include the core RASE package and the RASE DB skeleton. For more information, see the RASE paper and the RASE supplementary materials. |
---|
- Introduction
- Quick example
- Installation
- Running RASE
- Files and directories
- Related repositories
- License
- Contact
This repository contains the RASE prediction pipeline for rapid inference of antibiotic resistance and susceptibility using genomic neighbor typing. For more information about the method and results, see the associated paper and supplementary repository.
The entire pipeline is specified within a single Snakemake Snakefile, which can be executed using GNU Make (see below).
The following example predicts antibiotic resistance from a sputum metagenomic sample and the pneumococcal database.
After install dependencies, run the commands below. The entire computation should require approximately 6 minutes on a standard laptop (MacBook Pro).
# clone and enter this repository
git clone --recursive https://github.com/c2-d2/rase-pipeline
# enter the directory
cd rase-pipeline
# run the pipeline on test data (S. pneumoniae database
# and nanopore reads from a metagenomic experiment)
make test
-
Installing dependencies. See RASE computational enviroment.
-
Cloning the RASE prediction pipeline. You can clone this repository using git
git clone --recursive https://github.com/c2-d2/rase-pipeline
or download it as a single .tar.gz file.
-
Installing a RASE database. A RASE database should be placed into
database/
. Every database consists of two files: a compressed k-mer index (.tar
) and a table with metadata for individual database strains (.tsv
). To be properly detected, both of the files should have the same base name. -
Placing nanopore reads. Nanopore reads should be placed into
reads/
as a single.fq
/.fastq
/.fa
/.fasta
file (possibly gzipped) per sequencing experiment.
Upon execution using make
the pipeline analyzes data in the following steps:
-
Detection. The pipeline detects user-provided RASE database(s) (in
database/
) and read sets (inreads/
), and generates all<db>
-<reads>
experiments. -
Data preparation. The pipeline decompressed the found databases and sorts reads by time of sequencing. When the time information is not available in the original reads (in read headers), it is estimated it based on the number of processed basepairs (assuming constant flow (kbps per sec)).
-
Matching. Reads are matched against the databases using ProPhyle and the computed nearest neighbors for each read stored in
matching/
in the RASE-BAM format. -
Prediction. Phenotypes are predicted from the computed nearest neighbors.
-
Plotting. The computed time characteristics of prediction are plotted.
make
- Run everything.make cluster
- Submit jobs to a cluster.make export
- Export all outputs torase_results.tar
.make clean
- Clean plots and prediction files.make cleanall
- Clean all output files (including bam files and logs).make replot
- Re-plot all figures.make test
- Run the smallest experiment only. If database or reads are not present, download examples. For testing and debugging purposes.
Input files:
database/
- Source database files.- Each database consists of two files:
<db>.tar.gz
and<db>.tsv
.
- Each database consists of two files:
reads/
- Nanopore reads (<reads>.{fq,fastq,fa,fasta}{.gz,}
).
Output files:
matching/
<reads>.fa
- Nanopore reads sorted by time.<reads>__<db>.bam
- Read matches to the reference strains in RASE/BAM.
prediction/
- Prediction files.<reads>__<db>/<timestamp>.tsv
- Cumulative weights calculated for all strains from the database at the time;h1
corresponds to the weights used in the paper.<reads>__<db>.predict.tsv
- Prediction timeline (each row corresponds to one minute).
plots/
- Plotted figures.<reads>__<db>.timeline.pdf
- Prediction as a function of time.<reads>__<db>.snapshot.<time>.pdf
- Rank plot for selected times (1 minute, 5 minutes, last minute)
- S. pneumoniae RASE DB.
- N. gonorrhoeae RASE DB.
- RASE supplementary. Supplementary Materials for the RASE paper, including figures and tables.
MIT.