A real-time multimodal chat application powered by Google's Gemini AI. This project enables real-time communication with Gemini's advanced AI capabilities through WebSocket connections, supporting both text and multimodal interactions.
- π― Real-time Gemini AI Integration
- πΌοΈ Multimodal Support (Text + Images)
- π WebSocket-based Real-time Communication
- π Secure Environment Configuration
- β‘ Async/Await Implementation
- π± Cross-platform Support
google-genai==0.2.2
websockets
python-dotenv
-
Clone the repository:
git clone https://github.com/jadouse5/gemini-realtime-multimodal.git cd gemini-realtime-multimodal
-
Set up virtual environment:
python -m venv venv source venv/bin/activate # On Windows: venv\Scripts\activate
-
Install dependencies:
pip install -r requirements.txt
-
Configure environment: Create
.env
file:GOOGLE_API_KEY=your_gemini_api_key_here
-
Start the server:
python main.py
-
Connect to WebSocket:
python -m http.server 8000
GOOGLE_API_KEY
: Your Gemini API keyPORT
: WebSocket server port (default: 8765)HOST
: WebSocket server host (default: localhost)
Contributions are welcome! To contribute:
- Fork the repository
- Create your feature branch (
git checkout -b feature/AmazingFeature
) - Commit changes (
git commit -m 'Add AmazingFeature'
) - Push to branch (
git push origin feature/AmazingFeature
) - Open a Pull Request
This project is licensed under the MIT License - see the LICENSE file for details.
Jad Tounsi
- GitHub: @jadouse5
- Google Gemini AI team
- WebSocket protocol contributors
- Python async/await community
For detailed documentation on the Gemini API, visit:
Made with β€οΈ and Python
This README has been updated to specifically match your repository at jadouse5/gemini-realtime-multimodal and includes relevant sections for a multimodal AI chat application using Gemini.