Skip to content

Commit

Permalink
Merge pull request #4 from ewels/master
Browse files Browse the repository at this point in the history
Many, many updates
  • Loading branch information
ewels authored Dec 14, 2016
2 parents 1e6d399 + a671695 commit 0168648
Show file tree
Hide file tree
Showing 9 changed files with 1,177 additions and 567 deletions.
3 changes: 2 additions & 1 deletion .gitattributes
Original file line number Diff line number Diff line change
@@ -1 +1,2 @@
*.nf linguist-language=Groovy
*.nf linguist-language=Groovy
*.config linguist-language=Groovy
4 changes: 3 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
@@ -1,3 +1,5 @@
.nextflow*
work/
data/
data/
results/
.DS_Store
113 changes: 75 additions & 38 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,53 +14,62 @@ the results files.

## Installation
### NextFlow installation
To use this pipeline, you need to have a working version of NextFlow installed. You can find more
information about this pipeline tool at [nextflow.io](http://www.nextflow.io/). The typical installation
of NextFlow looks like this:
See https://github.com/SciLifeLab/NGI-NextflowDocs for instructions on how to install and configure
Nextflow.

```
curl -fsSL get.nextflow.io | bash
mv ./nextflow ~/bin
```
### Pipeline installation
This pipeline itself needs no installation - NextFlow will automatically fetch it from GitHub when run if
`SciLifeLab/NGI-MethylSeq` is specified as the pipeline name.

#### UPPMAX
If you're running on a Swedish UPPMAX cluster you can load NextFlow as an environment module instead:
If you prefer, you can download the files yourself from GitHub and run them directly:
```
module load nextflow
git clone https://github.com/SciLifeLab/NGI-MethylSeq.git
nextflow NGI-MethylSeq/main.nf
```

The first time you load this you will get a warning about setting environment variables. To automatically set these at login, you can add the following lines to your `~/.bashrc` file:
```bash
export NXF_LAUNCHBASE=$SNIC_TMP
export NXF_TEMP=$SNIC_TMP
## Configuration
By default, the pipeline is configured to run on the Swedish UPPMAX cluster (milou / irma).

You will need to specify your UPPMAX project ID when running a pipeline. To do this, use
the command line flag `--project <project_ID>`.

To avoid having to specify this every time you run Nextflow, you can add it to your
personal Nextflow config file instead. Add this line to `~/.nextflow/config`:

```groovy
params.project = 'project_ID'
```

### NextFlow configuration
Next, you need to set up a config file so that NextFlow knows how to run and where to find reference
indexes. You can find an example configuration file for UPPMAX (milou) with this repository:
[`example_uppmax_config`](https://github.com/SciLifeLab/NGI-MethylSeq/blob/master/example_uppmax_config).
The pipeline will exit with an error message if you try to run it pipeline with the default
UPPMAX config profile and don't set project.

Copy this file to `~/.nextflow/config` and edit the line `'-A YOUR_PROJECT_ID'` to contain your
UPPMAX project identifier.

It is entirely possible to run this pipeline on other clusters - just note that you may need to customise
the `process` environment (eg. if you're using a cluster system other than SLURM) and the paths to reference
files.
### Running on other clusters
It is entirely possible to run this pipeline on other clusters, though you will need to set up
your own config file so that the script knows where to find your reference files and how your
cluster works.

### Pipeline installation
This pipeline itself needs no installation - NextFlow will automatically fetch it from GitHub when run if
`SciLifeLab/NGI-MethylSeq` is specified as the pipeline name.
Copy the contents of [`conf/uppmax.config`](conf/uppmax.config) to your own config file somewhere
and then reference it with `-c` when running the pipeline.

If you think that there are other people using the pipeline who would benefit from your configuration
(eg. other common cluster setups), please let us know. It should be easy to create a new config file
in `conf` and reference this as a named profile in [`nextflow.config`](nextflow.config). Then these
configuration options can be used by specifying `-profile <name>` when running the pipeline.

If you prefer, you can download the files yourself from GitHub and run them directly:
```
git clone https://github.com/SciLifeLab/NGI-MethylSeq.git
nextflow NGI-MethylSeq/main.nf
```

## Running the pipeline
The typical command for running the pipeline is as follows:
```
nextflow SciLifeLab/NGI-MethylSeq --reads '*_R{1,2}.fastq.gz' --genome GRCm38
nextflow SciLifeLab/NGI-MethylSeq --reads '*_R{1,2}.fastq.gz' --genome GRCh37
```

Note that the pipeline will create files in your working directory:
```bash
work # Directory containing the nextflow working files
results # Finished results (configurable, see below)
.nextflow_log # Log file from Nextflow
# Other nextflow hidden files, eg. history of pipeline runs and old logs.
```

### `--reads`
Expand All @@ -69,20 +78,35 @@ Location of the input FastQ files:
--reads 'path/to/data/sample_*_{1,2}.fastq'
```

**NB: Must be enclosed in quotes!**

Note that the `{1,2}` parentheses are required to specify paired end data. Running `--reads '*.fastq'` will treat
all files as single end. Also, note that the file path should be in quotation marks to prevent shell glob expansion.

If left unspecified, the pipeline will assume that the data is in a directory called `data` in the working directory.

### `--genome`
The reference genome to use of the analysis, needs to be one of the genome specified in the config file.
The human `GRCh37` genome is set as default.

See [`conf/uppmax.config`](conf/uppmax.config) for a list of the supported reference genomes
and their keys. Common genomes that are supported are:

* Human
* `--genome GRCh37`
* Mouse
* `--genome GRCm38`
* Drosophila
* `--genome BDGP6`
* _S. cerevisiae_
* `--genome 'R64-1-1'`

> There are numerous others - check the config file for more.
If you usually want to work with a single species, you can set a default in your user config file.
For example, add this line to `~/.nextflow/config`:
```
--genome 'GRCm38'
params.genome = 'GRCh37'
```
The `example_uppmax_config` file currently has the location of references for most of the
[Illumina iGenomes](http://support.illumina.com/sequencing/sequencing_software/igenome.html)
held on UPPMAX.

### Trimming Parameters
The pipeline accepts a number of parameters to change how the trimming is done, according to your data type.
Expand All @@ -105,7 +129,7 @@ You can specify custom trimming parameters as follows:

Finally, specifying `--rrbs` will pass on the `--rrbs` parameter to TrimGalore!

## Bismark Parameters
### Bismark Parameters
Using the `--pbat` parameter will affect the trimming (see above) and also set the `--pbat` flag when
aligning with Bismark.

Expand All @@ -114,6 +138,19 @@ This can also be set with `--non_directional` (doesn't affect trimming).

Use the `--unmapped` flag to set the `--unmapped` flag with Bismark align and save the unmapped reads.

### Deduplication
By default, the pipeline includes a deduplication step after alignment. If you would like to skip this
step (eg. for RRBS data), use the `--nodedup` command line option.

### `--bismark_index`
If you prefer, you can specify the full path to your reference genome when you run the pipeline:
```
--bismark_index [path to Bismark index]
```

### `--outdir`
The output directory where the results will be saved.

### `-c`
Specify the path to a specific config file (this is a core NextFlow command). Useful if using different UPPMAX
projects or different sets of reference genomes.
Expand Down
Loading

0 comments on commit 0168648

Please sign in to comment.