-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request #12 from alliance-genome/KANBAN-507_managed-workflow
KANBAN-507 managed workflow implementation
- Loading branch information
Showing
16 changed files
with
266 additions
and
21 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,4 +1,4 @@ | ||
FROM biocontainers/clustalo:v1.2.4-2-deb_cv1 | ||
|
||
ENTRYPOINT [ "clustalo"] | ||
CMD [ "--help" ] | ||
USER root | ||
CMD [ "clustalo", "--help" ] |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,2 +1,11 @@ | ||
CONTAINER_NAME=agr_pavi/alignment | ||
ADDITIONAL_BUILD_ARGS= | ||
|
||
.PHONY: clean | ||
|
||
clean: | ||
$(eval ADDITIONAL_BUILD_ARGS := --no-cache) | ||
@: | ||
|
||
container-image: | ||
docker build --no-cache -t agr_pavi/alignment . | ||
docker build ${ADDITIONAL_BUILD_ARGS} -t ${CONTAINER_NAME} . |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,14 +1,23 @@ | ||
# Manual invokation and testing instructions | ||
First build the docker image: | ||
This subdirectory contains the alignment component of PAVI. | ||
|
||
# Local invocation and testing instructions | ||
To build the docker image: | ||
```bash | ||
make docker-image | ||
``` | ||
|
||
To build a clean docker image (without using caching, for troubleshooting potential caching issues): | ||
```bash | ||
make clean docker-image | ||
``` | ||
|
||
Then run the container to run any alignment. | ||
|
||
Use a volume mount (`-v`) as appropriate to enable the container access to the input and output directorie(s). | ||
Use a volume mount (`-v`) as appropriate to provide the container access to the input and output directorie(s) | ||
on your local system. | ||
Specify the clustalo command-line arguments as appropriate after the `docker run` command, as per below example: | ||
```bash | ||
docker run -v /abs/path/to/in-out-dir:/mnt/pavi/ --rm pavi/alignment -i /mnt/pavi/input-seqs.fa -outfmt=clustal -o /mnt/pavi/clustal-output.aln | ||
docker run -v /abs/path/to/in-out-dir:/mnt/pavi/ --rm agr_pavi/alignment \ | ||
clustalo -i /mnt/pavi/input-seqs.fa --outfmt=clustal --resno -o /mnt/pavi/clustal-output.aln | ||
``` | ||
Once the run completed, Clustal-formattted alignment results can then be found locally in `</abs/path/to/in-out-dir>/clustal-output.aln`. | ||
Once the run completed, Clustal-formatted alignment results can then be found locally in `</abs/path/to/in-out-dir>/clustal-output.aln`. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,6 @@ | ||
# NextFlow output files | ||
nextflow | ||
.nextflow/ | ||
.nextflow.log.* | ||
work/ | ||
pipeline-results/ |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,15 @@ | ||
RELEASE=23.10.1 | ||
|
||
.PHONY: build-workflow-local-deps run-workflow-local | ||
|
||
nextflow: | ||
curl -L https://github.com/nextflow-io/nextflow/releases/download/v${RELEASE}/nextflow-${RELEASE}-all -o nextflow | ||
chmod u+x nextflow | ||
|
||
build-workflow-local-deps: | ||
make -C ../seq_retrieval/ container-image | ||
make -C ../alignment/ container-image | ||
|
||
run-integration-test: nextflow | ||
./nextflow run -profile test protein-msa.nf | ||
@diff -qs pipeline-results/alignment-output.aln tests/resources/integration-test-results.aln |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,25 @@ | ||
This subdirectory contains all code that defines the workflows, | ||
which tie all pipeline components together into a fully functional and scalable pipeline | ||
comprising of all data retrieval and computation required for each PAVI alignment. | ||
To that goal, NextFlow is used as workflow manager and Domain Specific Language. | ||
|
||
To download nextflow: | ||
```bash | ||
make nextflow | ||
``` | ||
|
||
To run the protein MSA workflow locally: | ||
1. Build all required components locally: | ||
```bash | ||
make build-workflow-local-deps | ||
``` | ||
2. Run the pipeline with approriate input arguments as seen in below example: | ||
```bash | ||
./nextflow run protein-msa.nf --input_seq_regions '[ | ||
{"name": "C54H2.5.1", "seq_id": "X", "seq_strand": "-", | ||
"seq_regions": "[\"5780644..5780722\", \"5780278..5780585\", \"5779920..5780231\", \"5778875..5779453\"]", | ||
"fasta_file_url": "https://s3.amazonaws.com/agrjbrowse/fasta/GCF_000002985.6_WBcel235_genomic.fna.gz"}, | ||
{"name": "ERV29-S288C", "seq_id": "chrVII", "seq_strand": "-", "seq_regions": "[\"1061590..1060658\"]", | ||
"fasta_file_url": "https://s3.amazonaws.com/agrjbrowse/fasta/GCF_000146045.2_R64_genomic.fna.gz"} | ||
]' | ||
``` |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,7 @@ | ||
docker.enabled = true | ||
|
||
profiles { | ||
test { | ||
includeConfig 'tests/integration/test.config' | ||
} | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,42 @@ | ||
params.input_seq_regions | ||
|
||
process sequence_retrieval { | ||
container 'agr_pavi/seq_retrieval' | ||
|
||
input: | ||
val request_map | ||
|
||
output: | ||
path "${request_map.name}-protein.fa" | ||
|
||
script: | ||
""" | ||
main.py --output_type protein \ | ||
--name ${request_map.name} --seq_id ${request_map.seq_id} --seq_strand ${request_map.seq_strand} \ | ||
--fasta_file_url ${request_map.fasta_file_url} --seq_regions '${request_map.seq_regions}' \ | ||
> ${request_map.name}-protein.fa | ||
""" | ||
} | ||
|
||
process alignment { | ||
container 'agr_pavi/alignment' | ||
|
||
publishDir "pipeline-results/", mode: 'copy' | ||
|
||
input: | ||
path 'alignment-input.fa' | ||
|
||
output: | ||
path 'alignment-output.aln' | ||
|
||
script: | ||
""" | ||
clustalo -i alignment-input.fa --outfmt=clustal --resno -o alignment-output.aln | ||
""" | ||
} | ||
|
||
workflow { | ||
def seq_region_channel = Channel.of(params.input_seq_regions).splitJson() | ||
|
||
seq_region_channel | sequence_retrieval | collectFile(name: 'alignment-input.fa', sort: { file -> file.name }) | alignment | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,3 @@ | ||
params { | ||
input_seq_regions = '[{"name": "C54H2.5.1", "seq_id": "X", "seq_strand": "-", "seq_regions": "[\\"5780644..5780722\\", \\"5780278..5780585\\", \\"5779920..5780231\\", \\"5778875..5779453\\"]", "fasta_file_url": "https://s3.amazonaws.com/agrjbrowse/fasta/GCF_000002985.6_WBcel235_genomic.fna.gz"}, {"name": "ERV29-S288C", "seq_id": "chrVII", "seq_strand": "-", "seq_regions": "[\\"1061590..1060658\\"]", "fasta_file_url": "https://s3.amazonaws.com/agrjbrowse/fasta/GCF_000146045.2_R64_genomic.fna.gz"}]' | ||
} |
26 changes: 26 additions & 0 deletions
26
pipeline/workflow/tests/resources/integration-test-results.aln
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,26 @@ | ||
CLUSTAL O(1.2.4) multiple sequence alignment | ||
|
||
|
||
C54H2.5.1 ------------------------MNQFRAPGGQN--EML-----------AKAE-DAAE 22 | ||
ERV29-S288C MSYRGPIGNFGGMPMSSSQGPYSGGAQFRSNQNQSTSGILKQWKHSFEKFASRIEGLTDN 60 | ||
***: .*. :* :: * : : | ||
|
||
C54H2.5.1 DFFRKTRTYLPHIARLCLVSTFLEDGIRMYFQWDDQKQFMQESWSCGWFIATLFVIYNFF 82 | ||
ERV29-S288C AVVYKLKPYIPSLSRFFIVATFYEDSFRILSQWSDQIFYLNKWKHYPYFFVVVFLVVVTV 120 | ||
.. * : *:* ::*: :*:** **.:*: **.** :::: :*:..:*:: . | ||
|
||
C54H2.5.1 GQFIPVLMIMLRKKVLVACGILASIVILQTIAYHILWDLKFLARNIAVGGGLLLLLAETQ 142 | ||
ERV29-S288C SMLIGASLLVLRKQTNYATGVLCACVISQALVYGLFTGSSFVLRNFSVIGGLLIAFSDSI 180 | ||
. :* . :::***:. * *:*.: ** *::.* :: . .*: **::* ****: :::: | ||
|
||
C54H2.5.1 EEKASLFAGVPTMGD-SNKPKSYMLLAGRVLLIFMFMSLMHFEMSFMQVLEIVVGFALIT 201 | ||
ERV29-S288C VQNKTTFGMLPELNSKNDKAKGYLLFAGRILIVLMFIAFTFSKSWFTVVLTI-IG---TI 236 | ||
:: : *. :* :.. .:* *.*:*:***:*:::**::: . : * ** * :* | ||
|
||
C54H2.5.1 LVSIGYKTKLSAIVLVIWLFGLNLWLNAWWTIPSDRFYRDFMKYDFFQTMSVIGGLLLVI 261 | ||
ERV29-S288C CFAIGYKTKFASIMLGLILTFYNITLNNYWFYNN--TKRDFLKYEFYQNLSIIGGLLLVT 294 | ||
.:******:::*:* : * *: ** :* . ***:**:*:*.:*:******* | ||
|
||
C54H2.5.1 AYGPGGVSVDDYKKRW 277 | ||
ERV29-S288C NTGAGELSVDEKKKIY 310 | ||
* * :***: ** : |