cram2fastq

A python script to retrieve and convert crams from irods to fastq files. For internal use within the sanger HPC environment.

Installation

pip install cram2fastq
# or
pip install git+https://github.com/clatworthylab/cram2fastq.git

Before using the tool, check that:

samtools is available on your $PATH. If not:

export PATH=/nfs/team297/kt16/Softwares/samtools-1.11/bin:$PATH

REF_PATH is set as well. If not:

export REF_PATH=/lustre/scratch117/core/sciops_repository/cram_cache/%2s/%2s/%s:/lustre/scratch118/core/sciops_repository/cram_cache/%2s/%2s/%s:URL=http:://refcache.dnapipelines.sanger.ac.uk::8000/%s

If setting it up for the first time, just do this once:

echo 'export PATH=/nfs/team297/kt16/Softwares/samtools-1.11/bin:$PATH' >> ~/.bashrc
echo 'export REF_PATH=/lustre/scratch117/core/sciops_repository/cram_cache/%2s/%2s/%s:/lustre/scratch118/core/sciops_repository/cram_cache/%2s/%2s/%s:URL=http:://refcache.dnapipelines.sanger.ac.uk::8000/%s' >> ~/.bashrc
source ~/.bashrc

Instructions

usage: cram2fastq.py [-h] [--meta META] [--study STUDY] [--outpath OUTPATH] [--bulk] [--bsub] [--DNAP] [--queue QUEUE] [--ncpu NCPU] [--mem MEM] [--dryrun]

optional arguments:
  -h, --help         show this help message and exit
  --meta META        txt/csv file containing the SANGER SAMPLE IDS as per manifest as a separate line for each sample.
  --study STUDY      Study ID. This will be the name of the output folder.
  --outpath OUTPATH  Path to the directory holding the converted files.
  --bulk             If passed, assume file is bulk data rather than 10x data.
  --bsub             If passed, submits as job to bsub.
  --DNAP             If passed, treats samples as created using semiautomated pipeline from DNAP (i.e. same ID for GEX/TCR/BCR). Output will be separated as folders.
  --queue QUEUE      bsub queue. Only works if --bsub is passed.
  --ncpu NCPU        bsub ncpu. Only works if --bsub is passed.
  --mem MEM          bsub memory. Only works if --bsub is passed.
  --dryrun           If passed, prints command rather than actually run.

After installation, it is as easy as doing:

cram2fastq.py --meta sampleids.txt --study test --outpath /path/to/folder --bulk

Adding the --bsub option will submit this as a job if you have many samples to process.

cram2fastq.py --meta sampleids.txt --study test --outpath /path/to/folder --bulk --bsub

sampleids.txt is simply a single column .txt or .csv file with the sanger sample ids (no header). The IDs should correspond to SANGER SAMPLE ID column in the manifest.

For example:

SangerSampleID00000001
SangerSampleID00000002
SangerSampleID00000003

Output

Once it is all finished, a folder (with the name as whatever you provide for --study) will be created under --outpath with the appropriate .cram files converted to .fastq.gz files.

Name		Name	Last commit message	Last commit date
Latest commit History 77 Commits
.github/workflows		.github/workflows
cram2fastq		cram2fastq
.gitattributes		.gitattributes
.gitignore		.gitignore
DESCRIPTION.md		DESCRIPTION.md
README.md		README.md
conftest.py		conftest.py
requirements.txt		requirements.txt
setup.cfg		setup.cfg
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

cram2fastq

Installation

Instructions

Output

About

Releases 4

Packages

Contributors 2

Languages

clatworthylab/cram2fastq

Folders and files

Latest commit

History

Repository files navigation

cram2fastq

Installation

Instructions

Output

About

Resources

Stars

Watchers

Forks

Releases 4

Packages 0

Contributors 2

Languages

Packages