Skip to content

jessexmaki/zika-antiviral-qsar-modeling

Repository files navigation

Bioinformatics-Project

This project utilizes bioactivity data on Zika virus inhibitors from the ChEMBL database. The raw bioactivity data is preprocessed to remove duplicates, filter out missing values, and classify compounds into active/inactive/intermediate categories based on potency cutoffs. Key physicochemical properties are calculated including pIC50, molecular weight, LogP, hydrogen bond donors/acceptors. The curated dataset is analyzed using statistical tests and visualizations to compare property distributions between active and inactive compounds.

Molecular descriptors are computed from SMILES strings using the PaDEL software package. Descriptors with low variance are filtered out to derive a final set of informative descriptors for model building.

A Random Forest regression model is developed to predict the pIC50 antiviral activity from the descriptors.

The trained regression model can be applied to new compounds by taking their SMILES string, computing descriptors using PaDEL, and making a prediction using the saved Random Forest model. This allows rapid prediction of new Zika inhibitors without needing to synthesize and test each compound experimentally.

Overall, this workflow demonstrates a typical QSAR modeling approach leveraging public bioactivity data, physicochemical properties, molecular descriptors, machine learning algorithms, and rigorous validation to derive predictive models for accelerated antiviral drug discovery. The code encapsulates data cleaning, analysis, model development, and application to new compounds.


Process for using testing.py:

  • Update the information in the Testing/input_smiles.txt for the chemical structures you want to test
  • Unzip the PaDel software padel.zip
  • Run testing.py

LINKS:

PaDel Software: https://github.com/dataprofessor/bioinformatics/

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published