Large Language Model-Based Natural Language Encoding Could Be All You Need for Drug Biomedical Association Prediction

Hanyu Zhang, Yuan Zhou, Zhichao Zhang, Huaicheng Sun, Ziqi Pan, Minjie Mou, Wei Zhang, Qing Ye, Tingjun Hou, Honglin Li^, Chang-Yu Hsieh^ and Feng Zhu^*

Graphical Abstract

Dependencies

LEDAP should be deployed on Linux in python 3.8.
Main requirements: python==3.8.8, pytorch==1.10.1, xgboost 2.0.3, scikit-learn==0.24.1, optuna 2.10.0.
requirements.txt is provided for environment dependency installation by pip install -r requirements.txt.
To use GPU, please install the GPU version of pytorch.

Install

Download source codes of LEDAP.
LEDAP should be deployed on Linux.
The LEDAP tree includes directories as follows:

 |- DDA
    |- bashes
    |- data
    |- rf
 |- DDI
    |- bashes
    |- data
    |- xbgoost
 |- DSA
    |- bashes
    |- data
    |- xbgoost
 |- paper
    |- materials
 |- representations
    |- llama_2-7b
 |- README.md
 |- requirements.txt
 |- LICENSE

Usage

1. Prepare feature representation for bio-entities using Large Language Models (Llama 2 in this study)

1.1 Collect textual descriptions according to the respective requirements

1.2 Conduct bio-text preprocessing and feature transformation following the Llama 2 Release

1.3 Place the representation data into the `./representaions/llama_2-7b/` imitating the examples.

Note: the prepared LLM-based representations used in this study were available on ~~Google Drive. ^ The associated account has been unexpectedly deactivated by Google, we are now working to fix the issue, PLEASE WAITING…. ^~~ MEGA.

2. Use LLM-based representations to analyze drug biomedical associations

2.1 Switch to the target path the user wants to investigate (`cd ./DDA` for drug-disease association, `cd ./DDI` for drug-drug interaction, `cd ./DSA` for drug-side effect association). Or construct a new path for additional research imitating the examples.

2.2 Place the predicting data that users want to investigate into the `./data` imitating the examples.

2.3 Switch to the directory of `./bashes` and modify the bash files according to the recorded guidance, then execute the following commands :

sh run.sh	# Run the model for DBA prediction

Citation and Disclaimer

The manuscript is published by Analytical Chemistry.

Please cite: Zhang H, Zhou Y, Zhang Z, Sun H, Pan Z, Mou M, Zhang W, Ye Q, Hou T, Li H, Hsieh CY, Zhu F. Large Language Model-Based Natural Language Encoding Could Be All You Need for Drug Biomedical Association Prediction. Anal. Chem. 96(30), 12395–12403

Should you have any questions, please contact Dr. Zhang at [email protected]

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Large Language Model-Based Natural Language Encoding Could Be All You Need for Drug Biomedical Association Prediction

Hanyu Zhang, Yuan Zhou, Zhichao Zhang, Huaicheng Sun, Ziqi Pan, Minjie Mou, Wei Zhang, Qing Ye, Tingjun Hou, Honglin Li^, Chang-Yu Hsieh^ and Feng Zhu^*

Graphical Abstract

Dependencies

Install

Usage

1. Prepare feature representation for bio-entities using Large Language Models (Llama 2 in this study)

1.1 Collect textual descriptions according to the respective requirements

1.2 Conduct bio-text preprocessing and feature transformation following the Llama 2 Release

1.3 Place the representation data into the `./representaions/llama_2-7b/` imitating the examples.

2. Use LLM-based representations to analyze drug biomedical associations

2.1 Switch to the target path the user wants to investigate (`cd ./DDA` for drug-disease association, `cd ./DDI` for drug-drug interaction, `cd ./DSA` for drug-side effect association). Or construct a new path for additional research imitating the examples.

2.2 Place the predicting data that users want to investigate into the `./data` imitating the examples.

2.3 Switch to the directory of `./bashes` and modify the bash files according to the recorded guidance, then execute the following commands :

Citation and Disclaimer

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
DDA		DDA
DDI		DDI
DSA		DSA
paper/materials		paper/materials
representations/llama_2-7b		representations/llama_2-7b
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

License

zhygit2020/LEDAP

Folders and files

Latest commit

History

Repository files navigation

Large Language Model-Based Natural Language Encoding Could Be All You Need for Drug Biomedical Association Prediction

Hanyu Zhang, Yuan Zhou, Zhichao Zhang, Huaicheng Sun, Ziqi Pan, Minjie Mou, Wei Zhang, Qing Ye, Tingjun Hou, Honglin Li * , Chang-Yu Hsieh * and Feng Zhu *

Graphical Abstract

Dependencies

Install

Usage

1. Prepare feature representation for bio-entities using Large Language Models (Llama 2 in this study)

1.1 Collect textual descriptions according to the respective requirements

1.2 Conduct bio-text preprocessing and feature transformation following the Llama 2 Release

1.3 Place the representation data into the ./representaions/llama_2-7b/ imitating the examples.

2. Use LLM-based representations to analyze drug biomedical associations

2.1 Switch to the target path the user wants to investigate (cd ./DDA for drug-disease association, cd ./DDI for drug-drug interaction, cd ./DSA for drug-side effect association). Or construct a new path for additional research imitating the examples.

2.2 Place the predicting data that users want to investigate into the ./data imitating the examples.

2.3 Switch to the directory of ./bashes and modify the bash files according to the recorded guidance, then execute the following commands :

Citation and Disclaimer

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Hanyu Zhang, Yuan Zhou, Zhichao Zhang, Huaicheng Sun, Ziqi Pan, Minjie Mou, Wei Zhang, Qing Ye, Tingjun Hou, Honglin Li^, Chang-Yu Hsieh^ and Feng Zhu^*

1.3 Place the representation data into the `./representaions/llama_2-7b/` imitating the examples.

2.1 Switch to the target path the user wants to investigate (`cd ./DDA` for drug-disease association, `cd ./DDI` for drug-drug interaction, `cd ./DSA` for drug-side effect association). Or construct a new path for additional research imitating the examples.

2.2 Place the predicting data that users want to investigate into the `./data` imitating the examples.

2.3 Switch to the directory of `./bashes` and modify the bash files according to the recorded guidance, then execute the following commands :

Packages