-
Create a Python virtual environment (version=3.7.11) according to the
'requirement.txt'
file; -
Put your sequence data into the
'data_cache'
folder, and run'data_preprocess.py'
in this folder to generate three pkl files, namely, encode_data.pkl, label.pkl and dset_list.pkl; -
Download the pre-trained ProtBERT model (pytorch_model.bin) from
http://ensemppis.idrblab.cn/download_ProtBERT
, and put it into the'feature_generator'
folder; -
Run
'ProtBERT_feature_generator.py'
in the'feature_generator'
folder to generate ProtBERT enbeddings for sequences; -
Run
'main-TransformerPPIS.py'
to train the TransformerPPIS model; -
Run
'main-GatCNNPPIS.py'
to train the GatCNNPPIS model;
- Run
'predict_EnsemPPIS.py'
to predicte PPIS using the trained TransformerPPIS and GatCNNPPIS.
All the benchmark datasets used in this study was provided in the 'datasets'
folder.
The two base models of EnsemPPIS (TransformerPPIS and GatCNNPPIS) were trained separately.
The commond for training TransformerPPIS on CPU:
nohup python main-TransformerPPIS.py > out-TransformerPPIS.txt 2>&1 &
The commond for training TransformerPPIS using single GPU or distributed training using multiple GPUs: single GPU:
CUDA_VISIBLE_DEVICES=1 nohup python -m torch.distributed.launch --nproc_per_node=1 --master_port 6666 main-TransformerPPIS.py > out-TransformerPPIS.txt 2>&1 &
multiple GPUs:
CUDA_VISIBLE_DEVICES=1,2 nohup python -m torch.distributed.launch --nproc_per_node=2 --master_port 6666 main-TransformerPPIS.py > out-TransformerPPIS.txt 2>&1 &