sight-vmware-tmc-intern-hackathon

Hackathon project part of VMWare intern 2020

Introduction

This is a project aims in helping visually impared users seeing the world.

Key features include:

Object Detection and Recognition: the integrated speaker can speak what's in front of the camera via voice commands.

This module uses YOLO for object detection and recognition.
Image Capture and Captioning: capture the image on camera with voice command trigger, and generate a descriptive sentence as voice output.

This module uses DeepAI
Text Reader: detect text present in front of camera and enable speaker to read the text for users.

This module uses Pytesseract for character detection and recognition.

The project uses Speech Recognition for speech recognition and pyttsx3 for text-to-speech conversion.

Prerequisites

Run project with Webcam

1) brew install portaudio

2) pip3 install -r requirements.txt

3) wget https://pjreddie.com/media/files/yolov3.weights

4) python3 main.py

Run project with Video

1) brew install portaudio

2) pip3 install -r requirements.txt

3) wget https://pjreddie.com/media/files/yolov3.weights

4) python3 main.py --webcam=N

Speech to Text

Run following commands to install dependencies

1) brew install portaudio

2) pip3 install -r requirements.txt

3) python3 speech_to_text.py

Image Detection

Run following commands to install dependencies

1) wget https://github.com/OlafenwaMoses/ImageAI/releases/download/1.0/yolo.h5

2) python3 image_detection_yolo.py

3) brew install tesseract

Instructions

Run main.py and wait for modules to load. Provide the flag --webcam N if you wish to see the demo with a preloaded video for object detection. Once loaded, it will prompt user for voice input.

Object Detection and Recognition: Ask with microphone What do you see or what is this.
Text Reader: Ask with microphone Read it or Please read.
Image Capture and Captioning: Ask with microphone Describe it or Describe surrounding.

Results

Object Detection and Recognition:
Image Capture and Captioning:
Text Reader:

FutureWork

We use laptop camera as a image source in this project, and we plan to integrate with a wearable device camera, such as camera glass, for image input.
Integrate with a micro processor such as a Raspberry Pi to make the device portable.
Develop additional features that would accomondate needs from visually impared people.

Name		Name	Last commit message	Last commit date
Latest commit History 47 Commits
cfg		cfg
data		data
demo_data		demo_data
.gitignore		.gitignore
README.md		README.md
brain.py		brain.py
coco.names		coco.names
controller.py		controller.py
darknet		darknet
image_detection_yolo.py		image_detection_yolo.py
main.py		main.py
ocr.png		ocr.png
requirements.txt		requirements.txt
run_cmd.py		run_cmd.py
test3.mp4		test3.mp4
text_caption.py		text_caption.py
text_read.py		text_read.py
text_to_speech.py		text_to_speech.py
video.mp4		video.mp4
yolov3.cfg		yolov3.cfg

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

sight-vmware-tmc-intern-hackathon

Contents

Introduction

Prerequisites

Speech to Text

Image Detection

Instructions

Results

FutureWork

About

Releases

Packages

Contributors 3

Languages

SwapnilBhosale/sight-vmware-tmc-intern-hackathon

Folders and files

Latest commit

History

Repository files navigation

sight-vmware-tmc-intern-hackathon

Contents

Introduction

Prerequisites

Speech to Text

Image Detection

Instructions

Results

FutureWork

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages