GitHub - BinchaoPeng/pTADS: The pTADS (prediction of TAD boundary and strength) method can simultaneously predict the TAD boundary and boundary strength across multiple cell lines, which is independent of the Hi-C contact matrix.

Introduction The pTADS(prediction of TAD boundary and strength) method can simultaneously predict the TAD boundary and characterize the boundary strength across multiple cell lines by integrating sequence and epigenetic profile information such as histone and transcription factor binding information. The pTADS method is consisted of random forest models to predict the TAD boundary and lasso based boundary score to characterize the TAD boundary strength, which is independent of the contact matrix-based interaction matrix information of Hi-C.

How to run it ? The pTADS is developed in R and can be downloaded from https://github.com/chrom3DEpi/pTADS or https://github.com/YunlongWang-ylw/pTADS. This repository contains scripts,examples and required packages for pTADS.

Scripts: run_pTADS.ori.R;

Packages: Rscript, ggplot2, randomForest, caret, PRROC, pROC, getopt;

When you run the program. Please follow the procedures in README.txt file in the ./pTADS directory.

Required data: To run the pTADS, the following data should be prepared: -m: the pre-trained model of Random forest in *.RData file -c: the coefficients for features in LASSO function in *.RData file -d: the pre-calculated features of sequence and epigenetic profile information for each sample in matrix format. It should be noted that the input matrix sample must be in the same feature order with the given example).

Examples: input file： ./model/GM12878_model.RData：the pre-trained random forest model for GM12878 cell line ./model/GM12878_coeff1.RData: the coefficients for features in LASSO function for GM12878 cell line ./example/test.chr1.40M_60M.matrix.txt：the pre-calculated features of sequence and epigenetic profile information such as histone and transcription factor binding information for the pre-trained model of Random forest and LASSO function

Rscript ./Scripts/run_pTADS.ori.R -h -m: the pre-trained model of Random forest in *.RData file -c: the coefficients for features in LASSO function in *.RData file -d: the pre-calculated features for each sample in matrix format. -w: the size of the sliding window.(for example: -win 10 ,represent the 10 bins) -s：the step of sliding windows across the whole genome.(for example: -slide 5, represent the 5 bins interval for adjacent regions) -p： the smooth parameters for the optimized TAD boundary scores which it is between 0 and 1, default value is 0.5 -r：the resultion of bins which is consistent with the resultion of the input samples.(example: -res 100000, represent the 100kbp,equal 1 bin) -o: Output directory

Usage: Rscript ./Scripts/run_pTADS.ori.R -m ./model/GM12878_model.RData -c ./model/GM12878_coeff1.RData -d ./example/test.chr1.40M_60M.matrix.txt -w 10 -s 1 -p 0.5 -r 100000 -o Results

Result files: .predicted.TAD_boundary: Predicted TAD boundaries .RF_BSSM： the prediction results for each sample which include TAD boundary strength, optimized Boundary Score and pTADS predicted results

If you have any questions or suggestions, please contacct us by email to Yunlong Wang([email protected]) or Yaping Fang ([email protected]) or Guoliang Li([email protected]).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 29 Commits
.idea		.idea
Results		Results
Scripts		Scripts
example		example
model		model
README.md		README.md
README.txt		README.txt

BinchaoPeng/pTADS

Folders and files

Latest commit

History

Repository files navigation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages