A Snakemake workflow for <description>
The usage of this workflow is described in the Snakemake Workflow Catalog.
If you use this workflow in a paper, don't forget to give credits to the authors by citing the URL of this repository or its DOI.
This workflow is a best-practice workflow for <detailed description>
.
The workflow is built using snakemake and consists of the following steps:
- Parse sample sheet containing sample meta data (
python
) - Simulate short read sequencing data on the fly (
dwgsim
) - Check quality of input read data (
FastQC
) - Trim adapters from input data (
cutadapt
) - Collect statistics from tool output (
MultiQC
)
This template workflow creates artificial sequencing data in *.fastq.gz
format. It does not contain actual input data. The simulated input files are nevertheless created based on a mandatory table linked in the config.yml
file (default: .test/samples.tsv
). The sample sheet has the following layout:
sample | condition | replicate | read1 | read2 |
---|---|---|---|---|
sample1 | wild_type | 1 | sample1.bwa.read1.fastq.gz | sample1.bwa.read2.fastq.gz |
sample2 | wild_type | 2 | sample2.bwa.read1.fastq.gz | sample2.bwa.read2.fastq.gz |
To run the workflow from command line, change the working directory.
cd path/to/snakemake-workflow-name
Adjust options in the default config file config/config.yml
.
Before running the entire workflow, you can perform a dry run using:
snakemake --dry-run
To run the complete workflow with test files using conda, execute the following command. The definition of the number of compute cores is mandatory.
snakemake --cores 3 --sdm conda --directory .test
To run the workflow with singularity / apptainer, add a link to a container registry in the Snakefile
, for example:
container: "oras://ghcr.io/<user>/<repository>:<version>"
for Github's container registry. Run the workflow with:
snakemake --cores 3 --sdm conda apptainer --directory .test
This table lists all parameters that can be used to run the workflow.
parameter | type | details | default |
---|---|---|---|
samplesheet | |||
path | str | path to samplesheet, mandatory | "config/samples.tsv" |
get_genome | |||
database | str | one of manual , ncbi |
ncbi |
assembly | str | RefSeq ID | GCF_000006785.2 |
fasta | str | optional path to fasta file | Null |
gff | str | optional path to gff file | Null |
gff_source_type | str | list of name/value pairs for GFF source | see config file |
simulate_reads | |||
read_length | num | length of target reads in bp | 100 |
read_number | num | number of total reads to be simulated | 100000 |
random_freq | num | frequency of random read sequences | 0.01 |
cutadapt | |||
threep_adapter | str | sequence of the 3' adapter | -a ATCGTAGATCGG |
fivep_adapter | str | sequence of the 5' adapter | -A GATGGCGATAGG |
default | str | additional options passed to cutadapt |
[-q 10 , -m 25 , -M 100 , --overlap=5 ] |
multiqc | |||
config | str | path to multiQC config | config/multiqc_config.yml |
- Firstname Lastname
- Affiliation
- ORCID profile
- home page
Köster, J., Mölder, F., Jablonski, K. P., Letcher, B., Hall, M. B., Tomkins-Tinch, C. H., Sochat, V., Forster, J., Lee, S., Twardziok, S. O., Kanitz, A., Wilm, A., Holtgrewe, M., Rahmann, S., & Nahnsen, S. Sustainable data analysis with Snakemake. F1000Research, 10:33, 10, 33, 2021. https://doi.org/10.12688/f1000research.29032.2.
- Replace
<owner>
and<repo>
everywhere in the template (also under .github/workflows) with the correct<repo>
name and owning user or organization. - Replace
<name>
with the workflow name (can be the same as<repo>
). - Replace
<description>
with a description of what the workflow does. - Update the workflow description, parameters, running options, authors and references in the
README.md
- Update the
README.md
badges. Add or remove badges forconda
/singularity
/apptainer
usage depending on the workflow's capability - The workflow will occur in the snakemake-workflow-catalog once it has been made public. Then the link under "Usage" will point to the usage instructions if
<owner>
and<repo>
were correctly set.