HiCPEP is a Python package for creating the Estimated PC1-pattern of the Hi-C Pearson matrix, which can be used for identifying the A/B compartments.A detailed document is avaiable in the Sphinx documentation of HiCPEP.
All the programs were tested in Ubuntu 22.04.4 LTS, HiCPEP requires python3
, pip
and libcurl4-openssl-dev
installed on your system.
For example (Paste these commands in Bash or Zsh):
sudo apt-get update
sudo apt-get install -y libcurl4-openssl-dev # For installing hic-straw
sudo apt-get install -y python3
sudo apt-get install -y pip
sudo apt-get install -y git
git clone [email protected]:ZhiRongDev/HiCPEP.git
cd HiCPEP
python3 -m pip install -e .
If you have already installed the requirements, just paste these commands:
git clone [email protected]:ZhiRongDev/HiCPEP.git
cd HiCPEP
python3 -m pip install -e .
Case 1: Using tools such as Straw to create the Pearson matrix as input.
# Using hic-straw
from hicpep import peptools
import hicstraw
import numpy as np
hic_path="https://hicfiles.s3.amazonaws.com/hiseq/gm12878/in-situ/combined.hic" # Path to the Juicer's `.hic` file.
chrom = "1"
resolution = 1000000
normalization = "KR"
hic = hicstraw.HiCFile(hic_path)
for chromosome in hic.getChromosomes():
if chromosome.name == chrom:
chrom_size = int(chromosome.length)
matrix = hic.getMatrixZoomData(chrom, chrom, "oe", normalization, "BP", resolution)
matrix_np = matrix.getRecordsAsMatrix(0, chrom_size, 0, chrom_size)
pearson_np = np.corrcoef(matrix_np)
est_np = peptools.create_est(pearson_np=pearson_np)
print(f"est_np: {est_np}")
Case 2:Using the Juicer created Pearson text file as input.
from hicpep import peptools
pearson_np = peptools.read_pearson(
pearson="gm12878_1000000_pearson_chr1.txt"
)
est_np = peptools.create_est(pearson_np=pearson_np)
print(f"est_np: {est_np}")
For more details, please check the examples. If you are interested in the programs we used for the paper, please check the code_for_paper. The HiCPEP Python library depends on NumPy, pandas, SciPy and Matplotlib.
Zhi-Rong Cheng, Jia-Ming Chang. Decoding the Power of PC1: A Fast and Accurate Covariance-Based Method for A/B Compartment Identification in Hi-C Data.