-
Notifications
You must be signed in to change notification settings - Fork 3
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request #7 from IARCbioinfo/v2.0b
Added support for paired-end and drop support for ABRA1
- Loading branch information
Showing
6 changed files
with
143 additions
and
161 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,57 +1,90 @@ | ||
# Nextflow pipeline for ABRA (Assembly Based ReAligner) | ||
# abra-nf | ||
|
||
Apply [ABRA](https://github.com/mozack/abra) to realign next generation sequencing data using localized assembly in a set of BAM files. After ABRA, the mate information is fixed using [`samtools fixmate`](http://www.htslib.org/doc/samtools.html) and BAM files are sorted and indexed using [sambamba](http://lomereiter.github.io/sambamba/). | ||
## Nextflow pipeline for ABRA2 (Assembly Based ReAligner) | ||
|
||
![Workflow representation](abra-nf.png) | ||
|
||
## Description | ||
|
||
Apply [ABRA2](https://github.com/mozack/abra2) to realign next generation sequencing data using localized assembly in a set of BAM files. | ||
|
||
This scripts takes a set of [BAM files](https://samtools.github.io/hts-specs/) (called `*.bam`) grouped folders as an input. There are two modes: | ||
- When using matched tumor/normal pairs, the two samples of each pair are realigned together (see https://github.com/mozack/abra#somatic--mode). In this case the user has to provide as an input the folders containing tumor (`--tumor_bam_folder`) and normal BAM files (`--normal_bam_folder`) (it can be the same unique folder). The tumor bam file format must be (`sample` `suffix_tumor` `.bam`) with `suffix_tumor` as `_T` by default and customizable in input (`--suffix_tumor`). (e.g. `sample1_T.bam`). The normal bam file format must be (`sample` `suffix_normal` `.bam`) with `suffix_normal` as `_N` by default and customizable in input (`--suffix_normal`). (e.g. `sample1_N.bam`). | ||
- When using only normal (or only tumor) samples, each bam is treated independently. In this case the user has to provide a single folder containing all BAM files (`bam_folder`). | ||
- When using only normal (or only tumor) samples, each bam is treated independently. In this case the user has to provide a single folder containing all BAM files (`--bam_folder`). | ||
|
||
In all cases BAI indexes have to be present in the same location than their BAM mates and called *.bam.bai`. | ||
In all cases BAI indexes have to be present in the same location than their BAM mates and called `*.bam.bai`. | ||
|
||
For [ABRA2](https://github.com/mozack/abra2) compatibility, use the option `--abra2` | ||
Note that ABRA v1 is no longer supported (see the last version supporting it here: https://github.com/IARCbioinfo/abra-nf/releases/tag/v1.0) | ||
|
||
## How to install | ||
## Dependencies | ||
|
||
1. Install [java](https://java.com/download/) JRE if you don't already have it. | ||
1. This pipeline is based on [nextflow](https://www.nextflow.io). As we have several nextflow pipelines, we have centralized the common information in the [IARC-nf](https://github.com/IARCbioinfo/IARC-nf) repository. Please read it carefully as it contains essential information for the installation, basic usage and configuration of nextflow and our pipelines. | ||
|
||
2. Install [nextflow](http://www.nextflow.io/). | ||
2. External software: | ||
- [java](https://www.java.com/) | ||
- [ABRA2](https://github.com/mozack/abra2) jar file | ||
|
||
```bash | ||
curl -fsSL get.nextflow.io | bash | ||
``` | ||
And move it to a location in your `$PATH` (`/usr/local/bin` for example here): | ||
```bash | ||
sudo mv nextflow /usr/local/bin | ||
``` | ||
|
||
3. Install and put in your PATH: [java](https://www.java.com/), [bedtools](http://bedtools.readthedocs.io/en/latest/), [bwa](http://bio-bwa.sourceforge.net), [sambamba](http://lomereiter.github.io/sambamba/), [samtools](http://www.htslib.org/) and download ABRA jar. Alternatively (recommended), you can simply use the docker image provided (see below). | ||
You can avoid installing all the external software by only installing Docker. See the [IARC-nf](https://github.com/IARCbioinfo/IARC-nf) repository for more information. | ||
|
||
## How to run | ||
## Input | ||
|
||
Simply use example: | ||
```bash | ||
nextflow run iarcbioinfo/abra-nf --bam_folder BAM/ --bed target.bed --ref ref.fasta --read_length 100 --abra_path /path/to/abra.jar | ||
``` | ||
* #### In tumor-normal mode | ||
|
||
By default, BAM files produced are output in the same folder as the input folder with the `abra_sorted_fixmate.bam` suffix. One can also specify the output folder by adding the optional argument `--out_folder BAM_ABRA` to the above command line for example. | ||
| Name | Description | | ||
|-----------|---------------| | ||
| `--tumor_bam_folder` | Folder containing tumor BAM files | | ||
| `--normal_bam_folder` | Folder containing matched normal BAM files | | ||
| `--suffix_tumor` | Suffix identifying tumor bam (default: `_T`) | | ||
| `--suffix_normal` | Suffix identifying normal bam (default: `_N`) | | ||
|
||
You can print the help by providing `--help` in the execution command line: | ||
```bash | ||
nextflow run iarcbioinfo/abra-nf --help | ||
``` | ||
* #### Otherwise | ||
|
||
Instead of installing all tools in step 3 above, we recommend to use the docker image we provide containing them by simply adding `-with-docker`: | ||
```bash | ||
nextflow run iarcbioinfo/abra-nf -with-docker ... | ||
``` | ||
| Name | Description | | ||
|-----------|---------------| | ||
| `--bam_folder` | Folder containing BAM files | | ||
|
||
## Parameters | ||
|
||
Installing [docker](https://www.docker.com) is very system specific (but quite easy in most cases), follow [docker documentation](https://docs.docker.com/installation/). Also follow the optional configuration step called `Create a Docker group` in their documentation. | ||
* #### Mandatory | ||
|
||
## Detailed instructions | ||
| Name | Example value | Description | | ||
|-----------|---------------|-----------------| | ||
| `--ref` | `/path/to/ref.fasta` | Reference fasta file indexed | | ||
| `--abra_path` | `/path/to/abra2.jar` | abra.jar explicit path | | ||
|
||
The exact same pipeline can be run on your computer or on a HPC cluster, by adding a [nextflow configuration file](http://www.nextflow.io/docs/latest/config.html) to choose an appropriate [executor](http://www.nextflow.io/docs/latest/executor.html). For example to work on a cluster using [SGE scheduler](https://en.wikipedia.org/wiki/Oracle_Grid_Engine), simply add a file named `nextflow.config` in the current directory (or `~/.nextflow/config` to make global changes) containing: | ||
```java | ||
process.executor = 'sge' | ||
* #### Optional | ||
|
||
| Name | Default value | Description | | ||
|-----------|---------------|-----------------| | ||
| `--bed` | `/path/to/intervals.bed` | Bed file containing intervals | | ||
| `--mem` | 16 | Maximum RAM used | | ||
| `--threads` | 4 | Number of threads used | | ||
| `--output_folder` | `abra_BAM/` | Bed file containing intervals | | ||
|
||
* #### Flags | ||
|
||
Flags are special parameters without value. | ||
|
||
| Name | Description | | ||
|-----------|-----------------| | ||
| `--help` | Display help | | ||
| `--single` | Switch to single-end sequencing mode | | ||
|
||
## Usage | ||
|
||
Simple use case example: | ||
```bash | ||
nextflow run iarcbioinfo/abra-nf --bam_folder BAM/ --bed target.bed --ref ref.fasta --abra_path /path/to/abra.jar | ||
``` | ||
|
||
Other popular schedulers such as LSF, SLURM, PBS, TORQUE etc. are also compatible. See the nextflow documentation [here](http://www.nextflow.io/docs/latest/executor.html) for more details. Also have a look at the [other parameters for the executors](http://www.nextflow.io/docs/latest/config.html#scope-executor), in particular `queueSize` that defines the number of tasks the executor will handle in a parallel manner. | ||
## Output | ||
| Type | Description | | ||
|-----------|---------------| | ||
| ABRA BAM | Realigned BAM files with their indexes | | ||
|
||
## Contributions | ||
|
||
| Name | Email | Description | | ||
|-----------|---------------|-----------------| | ||
| Matthieu Foll* | [email protected] | Developer to contact for support | | ||
| Nicolas Alcala | [email protected] | Developer | |
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Oops, something went wrong.