Image Captioning with CLIP - RN50x4 for visually impaired people

This repository contains the implementation of an image captioning model using CLIP (Contrastive Language-Image Pretraining) with a ResNet-50x4 backbone. The model was developed as part of a thesis project and achieved the following performance metrics:

Model	BLEU@1	BLEU@2	BLEU@3	BLEU@4	METEOR
RN50x4	0.82	0.79	0.75	0.73	0.50

Overview

CLIP (Contrastive Language-Image Pretraining)

CLIP is a powerful vision-language pretraining model that learns joint representations of images and text. It has demonstrated state-of-the-art performance on various vision and language tasks.

Model Architecture

The image captioning model utilizes a ResNet-50x4 backbone for feature extraction and is fine-tuned using CLIP. The training process involves learning to generate descriptive captions for images based on the joint understanding of textual and visual data.

Dataset

The dataset used for training consists of 3800 images captured in public spaces, and each image is associated with four captions. This diverse dataset aims to enhance the model's ability to provide detailed and informative captions for various scenarios encountered in public environments.

Installation

To run the image captioning model, follow these steps:

Clone this repository:

git clone image-captioning-MM-CLIP-RN50x4-for-visually-impaired-people

Install transformers
```
!pip install transformers
```

Install CLIP

! pip install git+https://github.com/openai/CLIP.git

Open the Colab inference Navigate to the Colab notebook (.ipynb) and follow the steps outlined, including:
- a. Image Embedding
- b. Train
Output train is on .pt format. The trained model will produce output in .pt format. You can find the model weights in the output directory. The file may be named something like image_captioning_model.pt.
Set Up Flask Deployment
```
!pip install flask
```
- a. Create a new folder named deploy in the project directory.
- b. Move the trained model file (image_captioning_model.pt) to the deploy folder.
- c. Include your Flask deployment script (main.py) in the deploy folder.

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
deploy		deploy
CLIP_SKRIPSI_RN50x4 (3).ipynb		CLIP_SKRIPSI_RN50x4 (3).ipynb
README.md		README.md
architecture.png		architecture.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Image Captioning with CLIP - RN50x4 for visually impaired people

Overview

CLIP (Contrastive Language-Image Pretraining)

Model Architecture

Dataset

Installation

About

Releases

Packages

Languages

daffaalfajrii1/image-captioning-MM-CLIP-RN50x4-for-visually-impaired-people

Folders and files

Latest commit

History

Repository files navigation

Image Captioning with CLIP - RN50x4 for visually impaired people

Overview

CLIP (Contrastive Language-Image Pretraining)

Model Architecture

Dataset

Installation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages