† Co-first author
1MD/PhD Medical Scientist Training Program, Penn State College of Medicine, Hershey, PA, USA.
2Department of Molecular Biology and Genetics, Cornell University, Ithaca, New York, 14853, USA
Correspondence:[email protected]
PMID : XXXXXXXX
GEO ID : GSE266547
Genes are regulated by transcription factors (TFs) bound to DNA sites (TFBS). Unbound sites are often nucleosomal and inaccessible. Current assays cannot address whether TFs encounter phasing of the DNA helix on nucleosomal surfaces across a natural genomic context, and how this may change upon TF binding. Here we use an endonuclease, either alone or coupled to ChIP-exo to measure the genomic rotational and translational (positional) phasing of nucleosomal DNA at unbound and TF-bound TFBSs in human cells. Unbound sites had a preferred rotational phase, but generally lacked a translational phase. In contrast, TF/TFBS complexes were often engaged with an adjacent translationally and rotationally phased nucleosome. Thus, a few molecular themes may govern how TFs engage nucleosomes.
To recreate the figures for this manuscript, please execute the scripts in each directory in numerical order. Each directory's README includes more specific details on execution. To be more explicit, run the scripts in each directory in the following order: 00_Download_and_Preprocessing
, 01_Run_GenoPipe
, 02_Call_Nucleosomes
, 03_Call_JASPAR
, 04_Call_Motifs
, X_Bulk_Processing
, and then finally Z_Figures
.
Use the following anaconda environment initialization for setting up dependencies
conda create -n bx -c bioconda -c conda-forge bedtools bowtie2 bwa cutadapt meme opencv pandas samtools scipy sra-tools wget pybigwig
For genetrack-executing script, a python2 environment needed to be created. The create command for that env is as follows:
conda create -n genetrack -c conda-forge -c bioconda python=2.7 numpy
Perform the preprocessing steps including alignment of raw sequencing data from both novel and previously published data.
Perform quality control for genetic background on these data by running GenoPipe on the aligned BAMs.
Call nucleosome positions and identify TSS and +1 nucleosome reference points with different sorts.
Call JASPAR motifs and subset to "bound" sites using ENCODE peak data.
Build de novo sequence-specific transcription factor (ssTF) motif reference points using Benzonase ChIP-exo data.
With the BAM and BED files built from the scripts in the above directories, perform bulk read pileups for heatmaps and composites.
Copy/organize results from bulk processing into figure-specific directories corresponding to subfigures in the manuscript. Also includes custom/one-off scripts for analysis that didn't need bulk-style execution.
Store large files to be globally accessed by the scripts in each directory
Generalized scripts and executables for global access by each of the numbered directories.