Skip to content

Cancer-CRISPR is a tool to map sgRNA sequences to a genome of interest, retrieve genomic information of mapped region and further extract expression of genes the sgRNAs mapped to from TCGA datasets.

Notifications You must be signed in to change notification settings

erkutilaslan/cancer_crispr

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Cancer-CRISPR

This repository contains the Nextflow pipeline for Cancer-CRISPR tool. This pipeline maps the sgRNA sequences provided in the library.fa file to hg38, retrieves mapped chromosome, strand as well as start and end positions. Further it checks whether the gene a particular sgRNA mapped to matches the provided gene name for the sgRNA. Finally, it downloads gene expression matrix from 2 TCGA-BRCA datasets and provides the TPM expression values obtained from STAR - Counts workflow. The mapping table (reads_mapped_table.tsv) and the TCGA expression data table (tcga_data_table.tsv) are created inside the results directory when the pipeline is run.

This pipeline uses a prebult index of hg38 which should be provided by the user inside data/index folder.

Workflow:

Dependencies

This workflow is written in DSL1 so it is compatible with Nextflow version 22.10.x or earlier.

Required software to run the pipeline is provided in env_explicit.yml file.

Conda environment using this file can be built with the command:

conda env create -f env_explicit.yml

R packages:

This workflow uses TCGAbiolinks package to retrieve gene expression data of TCGA and biomaRt to map ENSEMBL gene IDs to HGNC. To install these packages run the following code in R environment:

if (!requireNamespace("BiocManager", quietly = TRUE))
    install.packages("BiocManager")

BiocManager::install(c("dplyr","TCGAbiolinks","biomaRt"))

Usage

  • Put pre-built index files inside the data/index/ directory.

  • Run the pipeline:

nextflow run cancer_crispr.nf

About

Cancer-CRISPR is a tool to map sgRNA sequences to a genome of interest, retrieve genomic information of mapped region and further extract expression of genes the sgRNAs mapped to from TCGA datasets.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published