Bioinformatics-Project

This project utilizes bioactivity data on Zika virus inhibitors from the ChEMBL database. The raw bioactivity data is preprocessed to remove duplicates, filter out missing values, and classify compounds into active/inactive/intermediate categories based on potency cutoffs. Key physicochemical properties are calculated including pIC50, molecular weight, LogP, hydrogen bond donors/acceptors. The curated dataset is analyzed using statistical tests and visualizations to compare property distributions between active and inactive compounds.

Molecular descriptors are computed from SMILES strings using the PaDEL software package. Descriptors with low variance are filtered out to derive a final set of informative descriptors for model building.

A Random Forest regression model is developed to predict the pIC50 antiviral activity from the descriptors.

The trained regression model can be applied to new compounds by taking their SMILES string, computing descriptors using PaDEL, and making a prediction using the saved Random Forest model. This allows rapid prediction of new Zika inhibitors without needing to synthesize and test each compound experimentally.

Overall, this workflow demonstrates a typical QSAR modeling approach leveraging public bioactivity data, physicochemical properties, molecular descriptors, machine learning algorithms, and rigorous validation to derive predictive models for accelerated antiviral drug discovery. The code encapsulates data cleaning, analysis, model development, and application to new compounds.

Process for using testing.py:

Update the information in the Testing/input_smiles.txt for the chemical structures you want to test
Unzip the PaDel software padel.zip
Run testing.py

LINKS:

PaDel Software: https://github.com/dataprofessor/bioinformatics/

Name		Name	Last commit message	Last commit date
Latest commit History 32 Commits
Data		Data
Testing		Testing
.DS_Store		.DS_Store
README.md		README.md
chembl_zika_download.py		chembl_zika_download.py
data_calculating_and_classification.py		data_calculating_and_classification.py
data_cleaning.py		data_cleaning.py
descriptor_list.csv		descriptor_list.csv
descriptors_output.csv		descriptors_output.csv
molecule.smi		molecule.smi
molecule_orgi.smi		molecule_orgi.smi
output_with_predictions.csv		output_with_predictions.csv
pIC50_processing.py		pIC50_processing.py
padel.sh		padel.sh
padel.zip		padel.zip
testing.py		testing.py
zika.ipynb		zika.ipynb
zika_model.pkl		zika_model.pkl

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Bioinformatics-Project

About

Releases

Packages

Languages

jessexmaki/zika-antiviral-qsar-modeling

Folders and files

Latest commit

History

Repository files navigation

Bioinformatics-Project

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages