iTarget

Dependencies

should be deployed on Linux in python 3.8.
Main requirements: python==3.8.8, pytorch==1.8.1.
To use GPU, please install the GPU version of pytorch.

Install

Download source codes.
Should be deployed on Linux.
Python environment preparation

We provide three packed conda environments for users to construct Python dependencies using Anaconda.

# operate in your own conda envs path, usullaly, in `~/anaconda3/envs` by default.
mkdir ~/anaconda3/envs/xmol
tar -zxvf ./_conda_envs/xmol.tar.gz -C ~/anaconda3/envs/xmol
mkdir ~/anaconda3/envs/esm2
tar -zxvf ./_conda_envs/esm2.tar.gz -C ~/anaconda3/envs/esm2
mkdir ~/anaconda3/envs/iTarget
tar -zxvf ./_conda_envs/iTarget.tar.gz -C ~/anaconda3/envs/iTarget

Usage

1. Prepare LLM representation for proteins and compounds using Large Language Models (ESM-2 and X-MOL in this study)

1.1 Preprocess data for benchmarks.

python _data_preprocess.py
# the produced '{type}_drugs.csv' and '{type}_prots.csv' files could be used in step 1.2

1.2 Switch to the target path and work following the tutorials in the corresponding file.

cd ./_ForFeatures/esm2/		# for proteins
cd ./_ForFeatures/xmol/		# for compounds

# after finishing representaion, back to the project root path

2. Template image construction

2.1 For template images, move the produced LLM feature files in step 1.2 to the working path `./data/original_data/scale/`.

mv ./_ForFeatures/esm2/data/{--esm2type}/{--datatype}/{--datatype}_all-data-merge-prot.csv ./data/original_data/scale/
# for proteins' template, by default, {--esm2type}='esm2_t36_3B_UR50D', {--datatype}='uniprot'

mv ./_ForFeatures/xmol/FT_to_embedding/data/for_output/{--datatype}_all-data-merge-drug.csv ./data/original_data/scale/
# for compounds' template, by default, {--datatype}='fullchembl'

2.2 The moved feature files in `./data/original_data/scale/` should be renamed using same {--scale_source} for {--datatype} according to the corresponding settings in downstream file `0_feadist.sh`. Here, we use 'uniprot+fullchembl' as an example, and then result in `uniprot+fullchembl_all-data-merge-prot.csv` and `uniprot+fullchembl_all-data-merge-drug.csv` two files in `./data/original_data/scale/`.

2.3 For feature images, move the produced LLM feature files to the working path `./data/original_data/`.

mv ./_ForFeatures/esm2/data/{--esm2type}/{--datatype}/{--datatype}_all-data-merge-prot.csv ./data/original_data/
# for proteins' features, by default, {--esm2type}='esm2_t36_3B_UR50D', {--datatype}='example' or user-defined

mv ./_ForFeatures/xmol/FT_to_embedding/data/for_output/{--datatype}_all-data-merge-drug.csv ./data/original_data/
# for compounds' features, by default, {--datatype}='example' or user-defined

2.4 Switch to the bashes path for feature distance calculation and feature image transformation.

cd bashes
conda activate iTarget

# calculate feature distance
sh 0_feadist.sh	# by default, {--scale_method}='standard', {--scale_source}='uniprot+fullchembl'

# copy calculated configs to work path
cp ../data/processed_data/drug_fea/scale/standard/*.cfg ./feamap/config/trans_from_{--scale_source}/
cp ../data/processed_data/protein_fea/scale/standard/*.cfg ./feamap/config/trans_from_{--scale_source}/

3. Image transformation

sh 1_trans_drug.sh # for compounds, by default, {--scale_method}='standard', {--disttype}='uniprot+fullchembl', {--source}='example' or user-defined sh 1_trans_prot.sh # for proteins, by default, {--scale_method}='standard', {--disttype}='uniprot+fullchembl', {--source}='example' or user-defined

4. Model training and Cross-validation

4.1 Prepare dataset for cross-validation :

sh 2_split_cvdata.sh
# optional, or you can directly prepare files following the examples in `./data/processed_data/split_cvdata/`.
# This step is not required for bindingdb benchmark, which has been done in step 1.1

4.2 Run model training and cross-validation:

sh 3_train_cv.sh # by defalut, {--kfold_num}=5, {--task}='cv', {--n_epochs}=128, {--gpu}=0, {--batch_size}=512, {--lr}=5e-4, {--monitor}='auc_val', {--source}='example'

Citation and Disclaimer

The manuscript is currently under peer review. Should you have any questions, please contact Dr. Zhang at [email protected] and Dr. Mou at [email protected]

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
_ForFeatures		_ForFeatures
_conda_envs		_conda_envs
bashes		bashes
data		data
feamap		feamap
model		model
run		run
0_feadist_fit.py		0_feadist_fit.py
1_map_transfer.py		1_map_transfer.py
2_split_cvdata.py		2_split_cvdata.py
README.md		README.md
_data_preprocess.py		_data_preprocess.py
main.py		main.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

iTarget

Dependencies

Install

Usage

1. Prepare LLM representation for proteins and compounds using Large Language Models (ESM-2 and X-MOL in this study)

1.1 Preprocess data for benchmarks.

1.2 Switch to the target path and work following the tutorials in the corresponding file.

2. Template image construction

2.1 For template images, move the produced LLM feature files in step 1.2 to the working path `./data/original_data/scale/`.

2.3 For feature images, move the produced LLM feature files to the working path `./data/original_data/`.

2.4 Switch to the bashes path for feature distance calculation and feature image transformation.

3. Image transformation

4. Model training and Cross-validation

4.1 Prepare dataset for cross-validation :

4.2 Run model training and cross-validation:

Citation and Disclaimer

About

Releases

Packages

Languages

zhygit2020/iTarget

Folders and files

Latest commit

History

Repository files navigation

iTarget

Dependencies

Install

Usage

1. Prepare LLM representation for proteins and compounds using Large Language Models (ESM-2 and X-MOL in this study)

1.1 Preprocess data for benchmarks.

1.2 Switch to the target path and work following the tutorials in the corresponding file.

2. Template image construction

2.1 For template images, move the produced LLM feature files in step 1.2 to the working path ./data/original_data/scale/.

2.3 For feature images, move the produced LLM feature files to the working path ./data/original_data/.

2.4 Switch to the bashes path for feature distance calculation and feature image transformation.

3. Image transformation

4. Model training and Cross-validation

4.1 Prepare dataset for cross-validation :

4.2 Run model training and cross-validation:

Citation and Disclaimer

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

2.1 For template images, move the produced LLM feature files in step 1.2 to the working path `./data/original_data/scale/`.

2.3 For feature images, move the produced LLM feature files to the working path `./data/original_data/`.

Packages