Skip to content
snamburi3 edited this page Apr 19, 2019 · 42 revisions

CloudNeo User Guide

Overview

This repository has the CWL implementation of CloudNeo: A cloud pipeline for identifying patient-specific tumor neoantigen. The workflow was developed on Seven Bridges Genomics' CGC platform using CWL-draft 2 specifications. The CGC is still in CWL-draft2 specifications as of March 2017. There are differences between the draft-2 and the current CWL version v.1.0.

Table of Contents

  1. Required Software
  2. Setting up the Environment
  3. Building the Docker Images
  4. Download reference files
  5. Running/Testing the CWL with the Rabix executor

Required Software

Note: The Dockerfiles point directly to repositories/links to download the respective tool, except for netmhcpan tool (there is no direct link to download netMHC).

Setting up the Environment

To setup the environment we need to install Docker, Java and Rabix (mainly for testing the CWL code). These instructions have been tested on Ubuntu system, but it should work on any Linux/Unix like systems.

Install Docker (Ubuntu)

  • sudo apt-get update
  • sudo apt-get install apt-transport-https ca-certificates
  • sudo apt-key adv --keyserver hkp://p80.pool.sks-keyservers.net:80 --recv-keys 58118E89F3A912897C070ADBF76221572C52609D
  • Open the /etc/apt/sources.list.d/backports.list file in your favorite editor.
  • sudo apt-get update
  • sudo apt-cache policy docker-engine
  • sudo apt-get install docker-engine
  • sudo service docker start
  • sudo gpasswd -a ${USER} docker
  • sudo service docker restart
  • sudo chown "$USER":"$USER" /home/"$USER"/.docker -R
  • sudo chmod g+rwx "/home/$USER/.docker" -R

Full instructions to install Docker are avialable on Docker Install Page

Install Java:

export JAVA_HOME=<YOUR INSTALLATION DIRECTORY>
export PATH="$JAVA_HOME/bin:$PATH"

Download Rabix:

wget https://github.com/rabix/bunny/releases/download/v1.0.0-rc3/rabix-1.0.0-rc3.tar.gz && tar -xvf rabix-1.0.0-rc3.tar.gz

Clone the CloudNeo repository:

To download the CloudNeo repository

git clone https://github.com/TheJacksonLaboratory/CloudNeo.git
  • If git is not installed, install it with
sudo apt-get install git-core

Building the Docker Images

The Dockerfiles used to develop the workflow are provided in the folder 'dockerfiles'. All the Dockerfile use ubuntu:14.0 version. To build the docker image, run the commands shown below. It is assumed that you are in the CloudNeo directory.

Important Note: Because of Licensing requirements, we are not providing the Docker images by themselves. We have provided the "Dockerfiles" required to do build the images. Please follow the commands shown below to build the images.

Important Note: There is no direct link to download the netMHC and netMHCpan softwares. The softwares are emailed from the original authors after the following form is filled. You need to copy the software (Linux) into the netMHC or netMHCpan dockerfile directory to build the image.

Note: Before you build the image, you need to download the netMHCpan software and place it in the netMHCpan netMHCpan.v3.0a directory. Make sure you download the V.3.0a version and also check the Dockerfile file to see if the .tar.gz file name matches.This version uses Linux version to develop the Docker image.

Note: Before you build the image, you need to download the netMHC software and place it in the netMHCpan netMHCpan.v3.0a directory. Make sure you download the V.4.0a version and also check the Dockerfile file to see if the .tar.gz file name matches. This version uses Linux version to develop the Docker image.

	# build bwa=0.7.13 image
	docker build -t bwa:cloudneo -f dockerfiles/bwa.v0.7.13/Dockerfile .
	
	# build hlaminer=1.3 image
	docker build -t hlaminer:cloudneo dockerfiles/hlaminer/Dockerfile .
	
	# build netmhcpan=3.0a image
        ## Important: before you build the image, you need to download the netMHCpan software and place it in the netMHCpan  netMHCpan.v3.0a directory. Make sure you download the V.3.0a version and also check the Dockerfile file to see if the .tar.gz file name matches.
	docker build -t netmhcpan:cloudneo dockerfiles/netMHCpan.v3.0a/Dockerfile .
	
	# build netmhc=4.0a image
        ## Important: before you build the image, you need to download the netMHC software and place it in the netMHCpan  netMHCpan.v3.0a directory. Make sure you download the V.4.0a version and also check the Dockerfile file to see if the .tar.gz file name matches. This version uses Linux version to develop the Docker image.
	docker build -t netmhc:cloudneo dockerfiles/netMHC.v4.0a/Dockerfile .

	# build polysolver image
	docker build -t polysolver:cloudneo dockerfiles/polysolver/Dockerfile .
	
	# build protein-translator image
	docker build -t protein-translator:cloudneo dockerfiles/protein-translator/Dockerfile .

	# build samtools=1.3 image
	docker build -t samtools:cloudneo dockerfiles/samtools.v1.3/Dockerfile .

	# build variant-effect-predictor=83 image
	docker build -t variant-effect-predictor:cloudneo dockerfiles/variant-effect-predictor/Dockerfile .
	
	# build vcf-parser image
	docker build -t vcf-parser:cloudneo dockerfiles/vcf-parser/Dockerfile .
	

The cloudneo.cwl CWL file has already the above docker images names (<name>:cloudneo). You need not edit the CWL file. If you have given a different name ( the -t parameter value), then please edit the CWL code to the correct name. To search for the docker images in the CWL, use the following pattern: ":cloudneo"

Download the reference files

Detailed information about the input specification is described in the manual.

Please refer the CGC manual for detailed explanation about the inputs.

To run this example, you need the following files

  • netmhcpan-3.0.imgt.fasta
  • sample.vcf (see VCF file format)
  • homo_sapiens_vep_83_GRCh37.tar.gz
  • Homo_sapiens.GRCh37.75.gtf
  • HumanProteins.GRCh37.75.csv
  • HLA-I_II_CDS.fasta
  • sample.bam (not provided in the example folder)

A sample input specification file has been included in the example repository. Make sure the paths in the file point to the correct directory - directory where you have downloaded the reference files and input BAM and VCF files (see VCF file format). The BAM and VCF files have not been provided. Please refer to the Sample VCF file format guide.

Running/Testing the CWL with the Rabix executor

To test the CWL with Rabix, we have included some example sample files in the github repo. Please see the directory test in the repository. The inputs.json json file has paths corresponding to the test directory.

    ./rabix-backend-local-1.0.0-rc3/rabix cloudneo.cwl inputs.json

See example log after running Rabix on CloudNeo CWL.