-
Download cromwell.
$ cd $ wget https://github.com/broadinstitute/cromwell/releases/download/34/cromwell-34.jar $ chmod +rx cromwell-34.jar
-
Git clone this pipeline and move into it.
$ cd $ git clone https://github.com/ENCODE-DCC/atac-seq-pipeline $ cd atac-seq-pipeline
-
Download a SUBSAMPLED (1/400) paired-end sample of ENCSR356KRQ.
$ wget https://storage.googleapis.com/encode-pipeline-test-samples/encode-atac-seq-pipeline/ENCSR356KRQ/ENCSR356KRQ_fastq_subsampled.tar $ tar xvf ENCSR356KRQ_fastq_subsampled.tar
-
Download pre-built genome database for hg38.
$ wget https://storage.googleapis.com/encode-pipeline-genome-data/test_genome_database_hg38_atac.tar $ tar xvf test_genome_database_hg38_atac.tar
-
Install Conda. Skip this if you already have equivalent Conda alternatives (Anaconda Python). Download and run the installer. Agree to the license term by typing
yes
. It will ask you about the installation location. On Stanford clusters (Sherlock and SCG4), we recommend to install it outside of your$HOME
directory since its filesystem is slow and has very limited space. At the end of the installation, chooseyes
to add Miniconda's binary to$PATH
in your BASH startup script.$ wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh $ bash Miniconda3-latest-Linux-x86_64.sh
-
Install Conda dependencies.
$ bash conda/uninstall_dependencies.sh # to remove any existing pipeline env $ bash conda/install_dependencies.sh
-
Run a pipeline for the test sample.
$ source activate encode-atac-seq-pipeline # IMPORTANT! $ INPUT=examples/local/ENCSR356KRQ_subsampled.json $ PIPELINE_METADATA=metadata.json $ java -jar -Dconfig.file=backends/backend.conf cromwell-34.jar run atac.wdl -i ${INPUT} -m ${PIPELINE_METADATA}
-
It will take about an hour. You will be able to find all outputs on
cromwell-executions/atac/[RANDOM_HASH_STRING]/
. See output directory structure for details. -
See full specification for input JSON file.
-
You can resume a failed pipeline from where it left off by using
PIPELINE_METADATA
(metadata.json
) file. This file is created for each pipeline run. See here for details. Once you get a new input JSON file from the resumer, use itINPUT=resume.[FAILED_WORKFLOW_ID].json
instead ofINPUT=examples/local/ENCSR356KRQ_subsampled.json
.