RNA-seq Data Processing by HPC

Quality Check - FastQC

Read Mapping - HISAT

Quantification, Normalization of "transcript-level" - Cufflinks and Cuffnorm

quantifying "Gene-level Expression" - FeatureCounts

Project Overview:

In the realm of genomics research, RNA-seq has emerged as a transformative technology, empowering us to explore "Gene Expression" intricacies with unprecedented depth. This project demonstrates essential skills for tackling challenges and seizing opportunities in RNA-seq analysis. Through curated hands-on experiences, we delve into theory and practice, utilizing an adapted dataset from Shaw et al. (2015) to immerse ourselves in the dynamic realm of RNA-seq analysis. The dataset's "treatment" and "control" samples hold keys to unravel the effects of specific drug treatments, guiding us through RNA-seq intricacies and showcasing its power in decoding molecular responses.

Step 1: Check FASTQ Quality Control

In the pursuit of reliable and accurate genomics analysis, quality control is foundational. We begin by assessing the quality of the sequencing data through the application of FastQC.

Command:

fastqc -o output_directory input_fastq_file(s)

Step 2: Build HISAT Index

The foundation for alignment by building the HISAT index, an component of the analysis pipeline.

Command:

hisat2-build reference_genome.fa index_prefix

Step 3: Execute HISAT Alignment

Accurate read mapping is critical to understanding the intricacies of transcription factor binding. Consistent with the methodology of the study, we employ HISAT for its proficiency in aligning reads to the reference genome.

Command:

hisat2 -x index_prefix -U input_fastq_file(s) -S aligned_reads.sam

Step 4: Conduct Transcript Assembly with Cufflinks

Cufflinks is responsible for transcript assembly and quantification.

Command:

cufflinks -o output_directory -p num_threads aligned_reads.sam

Step 5: Normalize Transcript Abundances with Cuffnorm

Normalizing the "transcript-level" data across samples to make them suitable for meaningful comparisons.

Command:

cuffnorm -o output_directory --library-norm-method quartile normalized_data_table.txt

Step 6: Quantify Gene-level Expression with FeatureCounts

FeatureCounts is a critical tool for quantifying "gene-level expression" from RNA-seq data. It aids in differential expression analysis, functional annotation, and other downstream analyses by providing accurate read counts for genes and other genomic features, enabling researchers to gain insights into the underlying biology of their experimental systems.

Command:

featureCounts -T num_threads -a annotation.gtf -o gene_counts.txt bam_file(s)

Summary:

In this immersive exploration, we navigate the intricate landscape of RNA-seq analysis, unraveling the molecular intricacies that underlie gene expression patterns. Through careful data processing and interpretation, we unlock the potential to uncover transformative insights within genomic data.

References:

Shaw, P., Chaotheing, S., Kaewprommal, P., Piriyapongsa, J., Wongsombat, C., Suwannakitti, N., Koonyosying, P., Uthaipibull, C., Yuthavong, Y., & Kamchonwongpaisan, S. (2015). Plasmodium parasites mount an arrest response to dihydroartemisinin, as revealed by whole transcriptome shotgun sequencing (RNA-seq) and microarray study. BMC Genomics, 16(1). https://doi.org/10.1186/s12864-015-2040-0

Blog:

https://ssidmarine.wordpress.com/2023/08/12/rna-seq-data-processing-by-hpc/

Access data:

https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE62136

Some output data available in the data folder

Name		Name	Last commit message	Last commit date
Latest commit History 43 Commits
data		data
scripts		scripts
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

RNA-seq Data Processing by HPC

Project Overview:

Summary:

References:

Blog:

Access data:

About

Releases

Packages

Languages

License

chingyaousf/RNA-seq-Data-Processing-by-HPC

Folders and files

Latest commit

History

Repository files navigation

RNA-seq Data Processing by HPC

Project Overview:

Summary:

References:

Blog:

Access data:

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages