Skip to content

Commit

Permalink
Added preprocessing files
Browse files Browse the repository at this point in the history
  • Loading branch information
Mittmich committed Jun 16, 2020
1 parent 7d2c668 commit 70413dc
Show file tree
Hide file tree
Showing 6 changed files with 110 additions and 0 deletions.
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
#!/usr/bin/env bash

seqDir="/groups/gerlich/experiments/Experiments_004800/004812/Sequencing_data/Pooled_FC_1_2_3_4/cooler/"
logDir="/groups/gerlich/experiments/Experiments_004800/004812/clusterlog/"

jobName="Merge_cis_trans_cool_exp4812"
logFile="$logDir${jobName}.log"
files[0]="${seqDir}G2.fc_1_2_3_4.wOldG2.cis.1000.cool"
files[1]="${seqDir}G2.fc_1_2_3_4.wOldG2.trans.1000.cool"
tempScript="cooler merge ${seqDir}G2.fc_1_2_3_4.wOldG2.cis_and_trans.1000.cool ${files[@]}"
echo $tempScript
sbatch -c 1 --mem 20G --qos=short --job-name $jobName --output $logFile --wrap="$tempScript"

Original file line number Diff line number Diff line change
@@ -0,0 +1,44 @@
"""Script used to transfer balancing weigths from
G2.cis_and_trans.1000.mcool to the cis and trans G2 samples"""

import h5py
import os

# define functions


def transferWeights(source, target, binsize):
# get weights from source
with h5py.File(source, 'r') as f:
weights = f["resolutions"][binsize]["bins"]["weight"]
weightArray = weights[:]
# write weights into target
with h5py.File(target, 'r+') as f:
try:
targetWeights = f["resolutions"][binsize]["bins"]["weight"]
targetWeights[...] = weightArray
except KeyError:
f["resolutions"][binsize]["bins"]["weight"] = weightArray


# set working directory

os.chdir("/groups/gerlich/experiments/Experiments_004800/004812/Sequencing_data/Pooled_FC_1_2_3_4/cooler/")


# transfer weights for cis and transds

targetDir = "/groups/gerlich/experiments/Experiments_004800/004812/Sequencing_data/Pooled_FC_1_2_3_4/cooler/"


# get different bins
bins = "1000,2000,4000,5000,6000,8000,10000,20000,30000,40000,50000,100000,120000,150000,160000,180000,200000,500000,1000000,5000000".split(",")
# transfer weights trans
source = 'G2.fc_1_2_3_4.wOldG2.cis_and_trans.1000.mcool'
target = f'{targetDir}G2.fc_1_2_3_4.wOldG2.trans.1000.mcool'
for binSize in bins:
transferWeights(source, target, binSize)
# transfer weights cis
target = f'{targetDir}G2.fc_1_2_3_4.wOldG2.cis.1000.mcool'
for binSize in bins:
transferWeights(source, target, binSize)
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
#!/usr/bin/env bash

# set paths
coolP="/groups/gerlich/experiments/Experiments_004800/004812/Sequencing_data/Pooled_FC_1_2_3_4/cooler/"
logDir="/groups/gerlich/experiments/Experiments_004800/004812/clusterlog/"

jobName="Zoomify_balance_all"
logFile="$logDir${jobName}.log"
tempScript="cooler zoomify ${coolP}G2.fc_1_2_3_4.wOldG2.all.1000.cool -n 22 \
-r 1000,2000,4000,5000,6000,8000,10000,20000,30000,40000,50000,100000,120000,150000,160000,180000,200000,500000,1000000,5000000 --balance\
--balance-args '--ignore-diags 1 --mad-max 5 --max-iters 500'"
echo $tempScript
sbatch -c 24 --mem 40G --partition=m --qos=short --job-name $jobName --output $logFile --wrap="$tempScript"
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
#!/usr/bin/env bash

# set paths
coolP="/groups/gerlich/experiments/Experiments_004800/004812/Sequencing_data/Pooled_FC_1_2_3_4/cooler/"
logDir="/groups/gerlich/experiments/Experiments_004800/004812/clusterlog/"

jobName="Zoomify_balance_cis_and_trans"
logFile="$logDir${jobName}.log"
tempScript="cooler zoomify ${coolP}G2.fc_1_2_3_4.wOldG2.cis_and_trans.1000.cool -n 22 \
-r 1000,2000,4000,5000,6000,8000,10000,20000,30000,40000,50000,100000,120000,150000,160000,180000,200000,500000,1000000,5000000 --balance\
--balance-args '--ignore-diags 1 --mad-max 5 --max-iters 500'"
echo $tempScript
sbatch -c 24 --mem 40G --partition=m --qos=short --job-name $jobName --output $logFile --wrap="$tempScript"
17 changes: 17 additions & 0 deletions preprocessing/Zoomify_cis_trans_wOldG2_exp4812_fc_1_2_3_4.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
#!/usr/bin/env bash

# set paths
coolP="/groups/gerlich/experiments/Experiments_004800/004812/Sequencing_data/Pooled_FC_1_2_3_4/cooler/"
logDir="/groups/gerlich/experiments/Experiments_004800/004812/clusterlog/"

ctypes="cis trans"


for ctype in $ctypes; do
jobName="Zoomify_${ctype}_${barcode}"
logFile="$logDir${jobName}.log"
tempScript="cooler zoomify ${coolP}G2.fc_1_2_3_4.wOldG2.${ctype}.1000.cool -n 1 \
-r 1000,2000,4000,5000,6000,8000,10000,20000,30000,40000,50000,100000,120000,150000,160000,180000,200000,500000,1000000,5000000"
echo $tempScript
sbatch -c 2 --mem 15G --qos=short --job-name $jobName --output $logFile --wrap="$tempScript"
done
10 changes: 10 additions & 0 deletions readme.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,16 @@

This repository contains notebooks for the analysis of scsHi-C data.

## Preprocessing

Preprocessing of scsHi-C experiments was done as follows: First, the [scshic_pipeline](https://github.com/gerlichlab/scshic_pipeline) was applied to raw sequencing data. The resulting cooler files were then balanced as follows:

- All-contact coolers were zoomified and balanced conventionally (see the following [script](https://github.com/Mittmich/scsHiCanalysis/blob/master/preprocessing/Zoomify_and_balance_all_wOldG2_exp4812_ignore_diag_1_fc_1_2_3_4.sh) for an example for G2 WT data)
- Cis-sister and trans-sister contacts were pooled (see the following [script](https://github.com/Mittmich/scsHiCanalysis/blob/master/preprocessing/Merge_coolers_cis_trans_forBalance_w_oldG2_fc_1_2_3_4.sh) for an example for G2 WT data)
- Then, the pooled cis-and-trans sister contacts were zoomified and balanced (see the following [script](https://github.com/Mittmich/scsHiCanalysis/blob/master/preprocessing/Zoomify_and_balance_cis_and_trans_wOldG2_exp4812_ignore_diag_1_fc_1_2_3_4.sh) for an example for G2 WT data)
- Then, cis-sister and trans-sister coolers were seperately zoomified (see the following [script](https://github.com/Mittmich/scsHiCanalysis/blob/master/preprocessing/Zoomify_cis_trans_wOldG2_exp4812_fc_1_2_3_4.sh) for an example for G2 WT data) and the obtained weights from the cis-and-trans sister coolers transferred to the cis-sister and trans-sister file respectively (see the following [script](https://github.com/Mittmich/scsHiCanalysis/blob/master/preprocessing/Transfer_weights_from_cis_and_trans_wOldG2_fc_1_2_3_4.py) for an example for G2 WT data).


## Fig. 1

### (d)
Expand Down

0 comments on commit 70413dc

Please sign in to comment.