Tutorial for GridEngine (SGE/PBS) clusters

Download cromwell on your $HOME directory.

$ cd 
$ wget https://github.com/broadinstitute/cromwell/releases/download/34/cromwell-34.jar
$ chmod +rx cromwell-34.jar

Git clone this pipeline and move into its directory.

$ cd
$ git clone https://github.com/ENCODE-DCC/atac-seq-pipeline
$ cd atac-seq-pipeline

Download a SUBSAMPLED (1/400) paired-end sample of ENCSR356KRQ.

$ wget https://storage.googleapis.com/encode-pipeline-test-samples/encode-atac-seq-pipeline/ENCSR356KRQ/ENCSR356KRQ_fastq_subsampled.tar
$ tar xvf ENCSR356KRQ_fastq_subsampled.tar

Download pre-built genome database for hg38.

$ wget https://storage.googleapis.com/encode-pipeline-genome-data/test_genome_database_hg38_atac.tar
$ tar xvf test_genome_database_hg38_atac.tar

Get information about a parallel environment (PE) on your SGE system. If your system doesn't have a PE then ask your admin to add one with name shm to SGE master.
```
$ qconf -spl
```

Our pipeline supports both Conda and Singularity.

For Conda users

Install Conda. Skip this if you already have equivalent Conda alternatives (Anaconda Python). Download and run the installer. Agree to the license term by typing yes. It will ask you about the installation location. On Stanford clusters (Sherlock and SCG4), we recommend to install it outside of your $HOME directory since its filesystem is slow and has very limited space. At the end of the installation, choose yes to add Miniconda's binary to $PATH in your BASH startup script.
```
$ wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
$ bash Miniconda3-latest-Linux-x86_64.sh
```

Install Conda dependencies.

$ bash conda/uninstall_dependencies.sh  # to remove any existing pipeline env
$ bash conda/install_dependencies.sh

Run a pipeline for the test sample. If your parallel environment (PE) found from step 5) has a different name from shm then edit the following shell script to change the PE name.
```
$ qsub examples/local/ENCSR356KRQ_subsampled_sge_conda.sh
```

For singularity users

CHECK YOUR SINGULARITY VERSION FIRST AND UPGRADE IT TO A VERSION >=2.5.2 OR PIPELINE WILL NOT WORK CORRECTLY.
```
$ singularity --version
```

Pull a singularity container for the pipeline. This will pull pipeline's docker container first and build a singularity one on ~/.singularity.

$ mkdir -p ~/.singularity && cd ~/.singularity && SINGULARITY_CACHEDIR=~/.singularity SINGULARITY_PULLFOLDER=~/.singularity singularity pull --name atac-seq-pipeline-v1.3.0.simg -F docker://quay.io/encode-dcc/atac-seq-pipeline:v1.3.0

Run a pipeline for the test sample. If your parallel environment (PE) found from step 5) has a different name from shm then edit the following shell script to change the PE name.
```
$ qsub examples/local/ENCSR356KRQ_subsampled_sge_singularity.sh
```

For all users

It will take about an hour. You will be able to find all outputs on cromwell-executions/atac/[RANDOM_HASH_STRING]/. See output directory structure for details.
See full specification for input JSON file.
You can resume a failed pipeline from where it left off by using PIPELINE_METADATA(metadata.json) file. This file is created for each pipeline run. See here for details. Once you get a new input JSON file from the resumer, then edit your shell script (examples/local/ENCSR356KRQ_subsampled_sge_*.sh) to use it INPUT=resume.[FAILED_WORKFLOW_ID].json instead of INPUT=examples/....

For singularity users

IF YOU WANT TO RUN PIPELINES WITH YOUR OWN INPUT DATA/GENOME DATABASE, PLEASE ADD THEIR DIRECTORIES TO workflow_opts/sge.json. For example, you have input FASTQs on /your/input/fastqs/ and genome database installed on /your/genome/database/ then add /your/ to singularity_bindpath. You can also define multiple directories there. It's comma-separated.
```
{
    "default_runtime_attributes" : {
        "singularity_container" : "~/.singularity/atac-seq-pipeline-v1.3.0.simg",
        "singularity_bindpath" : "/your/,YOUR_OWN_DATA_DIR1,YOUR_OWN_DATA_DIR2,..."
    }
}
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

tutorial_sge.md

tutorial_sge.md

Tutorial for GridEngine (SGE/PBS) clusters

For Conda users

For singularity users

For all users

For singularity users

Files

tutorial_sge.md

Latest commit

History

tutorial_sge.md

File metadata and controls

Tutorial for GridEngine (SGE/PBS) clusters

For Conda users

For singularity users

For all users

For singularity users