Skip to content

Latest commit

 

History

History
58 lines (42 loc) · 3.39 KB

self-learning.md

File metadata and controls

58 lines (42 loc) · 3.39 KB

Introduction to Peak Analysis

Learning Objectives

  • Describe peak data and different file formats generated from peak calling algorithms
  • Assess various metrics used to assess the quality of peak calls
  • Compare peak calls across samples within a dataset
  • Create visualizations to evaluate peak annotations
  • Evaluate differentially enriched regions between two sample groups

Installations

On your desktop

  1. R
  2. RStudio
  3. Integrative Genomics Viewer (IGV)
  4. The listed R packages

On your HPCC (if not using Harvard's O2 cluster)

Required

  1. Nextflow version 24.11.0-edge

Alternative to Nextflow

  1. samtools version 1.15.1
  2. bedtools version 2.30.0
  3. Picard version 2.27.5
  4. phantompeakqualtools version 1.2.2
  5. deepTools version 3.5.6
  6. bedGraphToBigWig version 302.1

NOTE: If you are not working on the O2 cluster and are using different versions of these software programs, these packages may still work with the provided commands. However, this workshop was designed on these versions specifically, so you may need to tweak some of the commands if you use different versions of this software.

Lessons

  1. Workflow overview: From sequenced reads to peaks
  2. Existing workflows for ChIP-seq analysis
  3. Understanding peaks and peak file formats
  4. Assessing peak quality metrics
  5. Assessing sample similarity and identifying potential outliers
  6. Concordance across replicates using peak overlaps
  7. Peak annotation and visualization using ChIPseeker
  8. Differential enrichment analysis using DiffBind
  9. Peak visualization using IGV
  10. Annotation and functional analysis of DE regions
  11. Motif analysis/discovery

NOTE: If you aren't working on Harvard's O2 cluster the directory structure for the HPCC that you are using is likely different and you will need to modify paths to work within your HPCC's directory structure.

Answer key


These materials have been developed by members of the teaching team at the Harvard Chan Bioinformatics Core (HBC). These are open access materials distributed under the terms of the Creative Commons Attribution license (CC BY 4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.