Cassava Leaf Disease Classification by Custom Model

Table of Contents

Introduction
- 1.1 Program Overview
Data Source
- 2.1 Dataset Overview
- 2.2 Accessing the Dataset
Task Specifications
- 3.1 Data Management
  - 3.1.1 Data Acquisition
  - 3.1.2 Exploratory Data Analysis (EDA)
  - 3.1.3 Data Preprocessing
- 3.2 Model Engineering
  - 3.2.1 Dataset Splitting
  - 3.2.2 Model Architecture
  - 3.2.3 Model Training and Validation
- 3.3 Evaluation and Analysis
  - 3.3.1 Performance Testing
  - 3.3.2 Metrics Reporting
- 3.4 Conclusion and Future Work

1. Introduction

1.1 Program Overview

This project was done as a part of "Bytes of Intelligence: Data Science and AI Internship Program," This program provides a comprehensive learning experience in data science and AI through workshops, challenges, and mentorship. Which is an innovative platform designed to propel aspiring data scientists and AI enthusiasts into the forefront of technological advancement and real-world problem-solving.

2. Data Source

2.1 Dataset Overview

The dataset for the Cassava Leaf Disease Classification Challenge is a comprehensive collection of annotated images representing various common diseases affecting cassava plants, one of the most crucial crop resources in tropical and subtropical regions. It includes thousands of high-resolution images categorized into several disease classes, as well as a category for healthy leaves.This porvides 5 folder/class of data, some data are incorrect as inside one named folder one may find data of another folder. Dataset is highly imbalace. Most of them are Cassava Mosaic Disease (CMD).

2.2 Accessing the Dataset

The dataset is hosted on Kaggle, a popular platform for data science competitions and collaborative projects.

3. Task Specifications

3.1 Data Management

3.1.1 Data Acquisition

Data was downloaded from Kaggle

3.1.2 Exploratory Data Analysis (EDA)

Conduct an in-depth EDA to understand the dataset's characteristics:

Distribution of Classes: There was 5 class:
- 'Cassava Mosaic Disease (CMD)': 10526,
- 'Healthy': 2061,
- 'Cassava Green Mottle (CGM)': 1909,
- 'Cassava Brown Streak Disease (CBSD)': 1751,
- 'Cassava Bacterial Blight (CBB)': 870}

Image Quality and Variability: Most of the image was high resulation having 800*600 shape.
Data Insights: Some data are incorrect as inside one named folder one may find data of another folder. Dataset is highly imbalace. Most of them are Cassava Mosaic Disease (CMD).

3.1.3 Data Preprocessing

Preparation of the dataset for modeling: Data set was turned into a pandas dataframe having label and image data. Data was balanced and model was tested on both balanced and unbalanced data.

As data was imbalance augmentation was slightly incresing the performance in the compensation of huge amount of time.

Image Resizing: image was resized to 256*256 Standardize image sizes while maintaining aspect ratios.
Rescaling: Normalization was done to 0 to 1 range.
and other augmentation like Randomflip, RandomRotation, RandomZoom, RandomConstrast was done.

3.2 Model Engineering

Test was done on both custom model and pretrained model.

Mostly used Convolutional and Maxpooling layer were used repetedly. Then in the second part dense and dropout layer used after flattening.

3.2.1 Dataset Splitting

The dataset was divided into three subsets: Train dataset provided by kaggle divided into 2 set

Training Set: The largest portion, used to train the model. 80% of the Train dataset provided by kaggle was used for training.
Validation Set: Used to tune model parameters and prevent overfitting. 20% of the Train dataset provided by kaggle was used as validation dataset.
Test dataset provided by kaggle was reserved for evaluating the model's performance on unseen data.

3.2.2 Model Architecture

Layer Structure:

In the first section 7 Convolutional layer followed by 7 Maxpooling layer used , number of neurons were 32, 64 & 128. kernel size were (3,3) .

In the second part 2 dense layer were used followed by dropout layer after flattening. In this case number of neurons or filters were 256, 128 and 35% and 30% dropout was done to prevent overfitting.

Activation Functions: Except the last layer where "Softmax" activation function was used in all other case "Relu" activation function was used.
Transfer Learning: Pretrained EfficientNetB0 model was used to compare the performance.

3.2.3 Model Training and Validation

Model was tranied on both balanced and unbalanced data for 50 epoch.

Training and validation accuracy by EfficientNetB0 on balanced dataset

Training and validation accuracy by custom model on balanced dataset

Training and validation accuracy by custom model on un-balanced dataset

Training and validation loss by EfficientNetB0 on balanced dataset

Training and validation loss by custom model on balanced dataset

Training and validation loss by custom model on un-balanced dataset

3.3 Evaluation and Analysis

Validation and test accuracy by EfficientNetB0 on balanced dataset

Validation and test accuracy by custom model on balanced dataset

Validation and test accuracy by custom model on un-balanced dataset

3.3.1 Performance Testing

Confusion Matrix by EfficientNetB0 on balanced dataset

Confusion Matrix by custom model on balanced dataset

Confusion Matrix by custom model on un-balanced dataset

image prediction by EfficientNetB0

image prediction by custom model trained on balanced dataset

image prediction by custom model trained on un-balanced dataset

Though custom model trained on un-balanced dataset seems performed better on accuracy but confusion matrix and image predtion shows that our custom model trained on balanced dataset actually performs batter.Custom model trained on balanced dataset predicts all classes where as Custom model trained on un-balanced dataset predicts on Cassava Mosaic Disease (CMD). On the other hand trainable data of unbalanced dataset was 13693 compared to 8000 balanced data. If Custom model trained on balanced dataset were trained for more epoch and more data surely it would performed better.

Certainly pretrained model performed betten than custom model. Because of lackings of knowledge my custom model is quiet simple.

3.3.2 Metrics Reporting

precision, recall, and F1 score of EfficientNetB0

precision, recall, and F1 score of custom model trained on balanced dataset

precision, recall, and F1 score of custom model trained on un-balanced dataset

3.4 Conclusion and Future Work:

Though the performance was not satisfactory but as my first project i am happy with that. My custom model was designed to have the best performane by fine tuning the number of neurons, number of layers, drop-out percentage but for various reason performance was average.

However the dataset and model performance was visualized nicely and performance of EfficientNetB0 was quite similar to custom model trained on un-balanced dataset. The dataset was imbalance and my sampling technique was not up-to-date. Hope my future work will be satisfactory.

Name		Name	Last commit message	Last commit date
Latest commit History 45 Commits
Mejbah Ahammad		Mejbah Ahammad
Sakib_Cassava_Project		Sakib_Cassava_Project
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Cassava Leaf Disease Classification by Custom Model

1. Introduction

1.1 Program Overview

2. Data Source

2.1 Dataset Overview

2.2 Accessing the Dataset

3. Task Specifications

3.1 Data Management

3.1.1 Data Acquisition

3.1.2 Exploratory Data Analysis (EDA)

3.1.3 Data Preprocessing

3.2 Model Engineering

3.2.1 Dataset Splitting

3.2.2 Model Architecture

3.2.3 Model Training and Validation

3.3 Evaluation and Analysis

3.3.1 Performance Testing

3.3.2 Metrics Reporting

3.4 Conclusion and Future Work:

About

Releases

Packages

Languages

License

abdullahsakib/Data-Science-and-AI-Internship-Program-2024

Folders and files

Latest commit

History

Repository files navigation

Cassava Leaf Disease Classification by Custom Model

1. Introduction

1.1 Program Overview

2. Data Source

2.1 Dataset Overview

2.2 Accessing the Dataset

3. Task Specifications

3.1 Data Management

3.1.1 Data Acquisition

3.1.2 Exploratory Data Analysis (EDA)

3.1.3 Data Preprocessing

3.2 Model Engineering

3.2.1 Dataset Splitting

3.2.2 Model Architecture

3.2.3 Model Training and Validation

3.3 Evaluation and Analysis

3.3.1 Performance Testing

3.3.2 Metrics Reporting

3.4 Conclusion and Future Work:

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages