Skip to content

Latest commit

 

History

History
121 lines (63 loc) · 3.66 KB

File metadata and controls

121 lines (63 loc) · 3.66 KB

sight-vmware-tmc-intern-hackathon

Hackathon project part of VMWare intern 2020

Contents

Introduction

Prerequisites

Instructions

Results And Demo

Future Work

Introduction

This is a project aims in helping visually impared users seeing the world.

Key features include:

  • Object Detection and Recognition: the integrated speaker can speak what's in front of the camera via voice commands.

    This module uses YOLO for object detection and recognition.

  • Image Capture and Captioning: capture the image on camera with voice command trigger, and generate a descriptive sentence as voice output.

    This module uses DeepAI

  • Text Reader: detect text present in front of camera and enable speaker to read the text for users.

    This module uses Pytesseract for character detection and recognition.

The project uses Speech Recognition for speech recognition and pyttsx3 for text-to-speech conversion.

Prerequisites

Run project with Webcam

1) brew install portaudio

2) pip3 install -r requirements.txt

3) wget https://pjreddie.com/media/files/yolov3.weights

4) python3 main.py

Run project with Video

1) brew install portaudio

2) pip3 install -r requirements.txt

3) wget https://pjreddie.com/media/files/yolov3.weights

4) python3 main.py --webcam=N

Speech to Text

Run following commands to install dependencies

1) brew install portaudio

2) pip3 install -r requirements.txt

3) python3 speech_to_text.py

Image Detection

Run following commands to install dependencies

1) wget https://github.com/OlafenwaMoses/ImageAI/releases/download/1.0/yolo.h5

2) python3 image_detection_yolo.py

3) brew install tesseract

Instructions

Run main.py and wait for modules to load. Provide the flag --webcam N if you wish to see the demo with a preloaded video for object detection. Once loaded, it will prompt user for voice input.

  • Object Detection and Recognition: Ask with microphone What do you see or what is this.

  • Text Reader: Ask with microphone Read it or Please read.

  • Image Capture and Captioning: Ask with microphone Describe it or Describe surrounding.

Results

  • Object Detection and Recognition: Object Detection and Recognition

  • Image Capture and Captioning: Image Capture and Captioning

  • Text Reader: Text Reader

FutureWork

  • We use laptop camera as a image source in this project, and we plan to integrate with a wearable device camera, such as camera glass, for image input.
  • Integrate with a micro processor such as a Raspberry Pi to make the device portable.
  • Develop additional features that would accomondate needs from visually impared people.