This repository contains the code developed during a 16-hour hackathon at Osedea, sponsored by Modal Labs. The goal of the project was to enable our Boston Dynamics robot, SPOT, to register a vocal command to draw something and then execute the drawing.
Our project aims to create a seamless pipeline where SPOT can take a vocal command, understand it, generate an image based on the command, and finally draw it. The workflow involves several key components:
- Stable Diffusion XL Modal: Used to generate images from text prompts.
- Whisper: Captures voice audio and transcribes it to text.
- OpenAI: Analyzes the intent and pulls the item to draw.
- Modal: Runs the entire process on serverless GPU compute for efficiency.
This monorepo consists of four main sections (though it ideally should have five):
A React TypeScript frontend for the app to visualize the workflow from voice input to image generation to SPOT drawing.
Setup:
npm install
Setup your .env
file:
VITE_SPEECH_RECOGNITION_API_URL=<url_of_the_speech_recognition_backend>
Contains the code to send GCODE commands to SPOT.
Contains the Modal code for image diffusion and image-to-GCODE conversion.
Setup:
pip install modal
modal setup
modal deploy diffusion.py
modal deploy vectorizer.py
This sets up live endpoints for the serverless functions, one for generating images on an A10G GPU and the other for image processing on the CPU.
Integrates Whisper for detecting voice commands and extracting intent. Also includes an evaluator using GPT Vision to find the best drawing.
Setup: Create a virtual environment and install requirements:
python -m venv venv source venv/bin/activate pip install -r requirements.txt`
Copy the example environment file and set up your OpenAI API
cp .env_example .env
Launch backend with
python main.py
Please note that the code is slightly dirty due to the limited time we had to write it. Improvements and refactoring are planned for the future.
This project is licensed under the MIT License. See the LICENSE file for details.
We would like to thank Modal Labs for sponsoring this hackathon and providing the resources necessary to bring this project to life.
Feel free to contribute, open issues, or submit pull requests to improve this project. Happy hacking!