pyClassify - Automated ML-based Tool for Classification Problems

This project aims to develop a comprehensive Python script that automates the process of running (binary or multi-class) classification problems on any given input data matrix in the form of a standard Feature x Instance matrix (.csv) file.

Project Structure

The project consists of multiple Python scripts, each responsible for a specific step in the machine learning pipeline. The main script orchestrates the execution of these scripts, ensuring a seamless and automated workflow.

Project Components

Preprocessing:
- This script handles the missing value and string conversion of the given main input file.
- User input: classification data in .csv format.
- Output: Preprocessed data matrix.
Normalization/Standardization:
- This script handles the normalization or standardization of the input data.
- User input: Type of normalization or standardization.
- Output: Normalized or standardized data matrix.
Feature Selection:
- This script performs feature selection on the preprocessed data.
- User input: Feature selection method and parameters.
- Output: Data matrix with selected features.
Cross-Validation Script:
- Implements cross-validation on the data.
- User input: Number of folds for cross-validation.
- Output: Cross-validated performance metrics.
Machine Learning Modeling Script:
- Executes the machine learning modeling for classification.
- User input: Classification algorithm and hyperparameters.
- Output: Trained machine learning model.
Prediction Script:
- Evaluates the predictive capability of the model on a blind dataset.
- Output: Accuracy and other performance metrics on the blind dataset.
Main Script:
- Orchestrates the execution of the above scripts.
- User input: File path of the input data matrix (.csv).
- Output: Generates plots, heatmaps, and prints performance metrics in a PDF file.

Usage

Clone the repository:

git clone https://github.com/yourusername/automated-ml-classification.git
cd automated-ml-classification

Install dependencies:
```
pip install fpdf
```
Run the main script:
```
python main.py
```
Follow the prompts to provide input options for each step of the pipeline.
Check the output PDF file for performance metrics and plots.

Notes

Ensure that the input data matrix is in the required format (Feature x Instance matrix in .csv format).

Feel free to contribute, report issues, or suggest improvements. Happy classifying!

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
01 Problem Statement.pdf		01 Problem Statement.pdf
README.md		README.md
blind_predict.py		blind_predict.py
breast_cancer.csv		breast_cancer.csv
children_anemia.csv		children_anemia.csv
classification_model.py		classification_model.py
cross_validation.py		cross_validation.py
faults.csv		faults.csv
feature_selection.py		feature_selection.py
iris.csv		iris.csv
main.py		main.py
normalization_data.py		normalization_data.py
preprocessing_data.py		preprocessing_data.py
problem statement.pdf		problem statement.pdf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

pyClassify - Automated ML-based Tool for Classification Problems

Project Structure

Project Components

Usage

Notes

About

Releases

Packages

Languages

coderkage/pyClassify

Folders and files

Latest commit

History

Repository files navigation

pyClassify - Automated ML-based Tool for Classification Problems

Project Structure

Project Components

Usage

Notes

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages