Large Language Model-Based Natural Language Encoding Could Be All You Need for Drug Biomedical Association Prediction
Hanyu Zhang, Yuan Zhou, Zhichao Zhang, Huaicheng Sun, Ziqi Pan, Minjie Mou, Wei Zhang, Qing Ye, Tingjun Hou, Honglin Li * , Chang-Yu Hsieh * and Feng Zhu *
- LEDAP should be deployed on Linux in python 3.8.
- Main requirements:
python==3.8.8
,pytorch==1.10.1
,xgboost 2.0.3
,scikit-learn==0.24.1
,optuna 2.10.0
. requirements.txt
is provided for environment dependency installation bypip install -r requirements.txt
.- To use GPU, please install the GPU version of
pytorch
.
- Download source codes of LEDAP.
- LEDAP should be deployed on Linux.
- The LEDAP tree includes directories as follows:
|- DDA
|- bashes
|- data
|- rf
|- DDI
|- bashes
|- data
|- xbgoost
|- DSA
|- bashes
|- data
|- xbgoost
|- paper
|- materials
|- representations
|- llama_2-7b
|- README.md
|- requirements.txt
|- LICENSE
1. Prepare feature representation for bio-entities using Large Language Models (Llama 2 in this study)
1.2 Conduct bio-text preprocessing and feature transformation following the Llama 2 Release
Note: the prepared LLM-based representations used in this study were available on Google Drive. ^ The associated account has been unexpectedly deactivated by Google, we are now working to fix the issue, PLEASE WAITING…. ^ MEGA.
2.1 Switch to the target path the user wants to investigate (cd ./DDA
for drug-disease association, cd ./DDI
for drug-drug interaction, cd ./DSA
for drug-side effect association). Or construct a new path for additional research imitating the examples.
2.2 Place the predicting data that users want to investigate into the ./data
imitating the examples.
2.3 Switch to the directory of ./bashes
and modify the bash files according to the recorded guidance, then execute the following commands :
sh run.sh # Run the model for DBA prediction
The manuscript is published by Analytical Chemistry.
Please cite: Zhang H, Zhou Y, Zhang Z, Sun H, Pan Z, Mou M, Zhang W, Ye Q, Hou T, Li H, Hsieh CY, Zhu F. Large Language Model-Based Natural Language Encoding Could Be All You Need for Drug Biomedical Association Prediction. Anal. Chem. 96(30), 12395–12403
Should you have any questions, please contact Dr. Zhang at [email protected]