Empower your streams with dynamic voice interactions.
Table of Contents
AI-Twitch-TTS is a real-time Twitch Text-to-Speech application built for interactive streaming experiences. The project orchestrates WebSocket connections for audio streaming, processes chat requests, and interfaces with external APIs for voice synthesis. It offers customizable voice options, real-time chat handling, and automated websocket reconnections, enhancing viewer engagement on Twitch streams. The projects modular design ensures a seamless integration of dependencies, automated testing, and CI/CD workflows for efficient development and deployment processes.
Example Usage from Samifying
AT-cm_GFGdmTEpggCtiNbY0qTQ0w.mp4
Feature | Description | |
---|---|---|
⚙️ | Architecture | Server-side application using WebSockets for real-time audio streaming, with client-side support for Twitch Text-to-Speech functionality. Maintains web server to handle requests and WebSocket connections effectively. |
🔩 | Code Quality | Well-structured codebase with clear separation of concerns, detailed inline comments, consistent naming conventions, and adherence to best practices. Follows the principles of clean code and maintainable architecture. |
📄 | Documentation | Adequate documentation with detailed explanations for modules like logging, environment setup, WebSocket handling, and HTTP endpoints. |
🔌 | Integrations | Relies on external libraries like godotenv, go-randomdata, WebSocket for Go, and others to enhance functionality like environment variable loading, random data generation, WebSocket communication, and real-time audio streaming. |
🧩 | Modularity | Codebase exhibits modularity through separate modules for logging, WebSocket handling, text-to-speech requests, alerts retrieval, and Pally WebSocket connections. Modules are designed for reusability and maintainability. |
System Requirements:
-
Internet
-
ffmpeg
Download latest release:
Create
./alerts/<channel>
folder with alert sound(s) in it for Pally (optional)Create
./effects
folder with effect sound(s) in it for effect tagsCreate a
.env
file in the same directoryFill out required Environmental Variables explained below and in the .env.example
Create
./effects
folder with effect sound(s) in it for effect tagsCreate
./alerts/<channel>
folder with alert sound(s) in it for Pally (optional)Either create a
.env
file with the required Environmental Variables explained below and in the .env.example or just change them in the compose file.
docker-compose.yml
version: "3.8"
services:
ai-twitch-tts:
image: johnnycyan/ai-twitch-tts:main
container_name: tts
ports:
- 6969:8080
environment:
- ELEVENLABS_KEY=${ELEVENLABS_KEY}
- ELEVENLABS_PRICE=${ELEVENLABS_PRICE}
- SERVER_URL=${SERVER_URL}
- SENTRY_URL=${SENTRY_URL}
- TTS_KEY=${TTS_KEY}
- PALLY_KEYS=${PALLY_KEYS}
- PALLY_VOICES=${PALLY_VOICES}
- VOICES=${VOICES}
- VOICE_MODELS=${VOICE_MODELS}
- VOICE_STYLES=${VOICE_STYLES}
- VOICE_MODIFIERS=${VOICE_MODIFIERS}
- MONGO_HOST=mongodb
- MONGO_PORT=27017
- MONGO_USER=${MONGO_USER}
- MONGO_PASS=${MONGO_PASS}
- MONGO_DB=${MONGO_DB}
- FFMPEG_ENABLED=true
volumes:
- ./effects:/app/effects
- ./alerts:/app/alerts
depends_on:
- mongodb
mongodb:
image: mongo
container_name: tts-mongo
restart: always
environment:
- MONGO_INITDB_ROOT_USERNAME=${MONGO_USER}
- MONGO_INITDB_ROOT_PASSWORD=${MONGO_PASS}
volumes:
- mongodb_data:/data/db
volumes:
mongodb_data:
Variable | Description |
---|---|
ELEVENLABS_KEY | Elevenlabs API key |
SERVER_URL | URL of where the server will be hosted (no protocol) Ex: example.com |
TTS_KEY | Secret key used to authenticate TTS generation |
VOICES | Json string list of name/id pairs for Elevenlabs voices |
VOICE_MODELS | Json string list of name/model pairs for Elevenlabs voices (optional) |
VOICE_STYLES | Json string list of name/style pairs for Elevenlabs voices (optional) |
VOICE_MODIFIERS | Json string list of name/modifier pairs for Elevenlabs voices (optional) |
PALLY_KEYS | Json string list of name/key pairs for Pally (optional) |
PALLY_VOICES | Json string list of channel/voice pairs for Pally (optional) |
SENTRY_URL | URL for Sentry logging of the client (optional) |
MONGO_HOST | URL for MongoDB Host (optional) |
MONGO_PORT | Port for MongoDB (optional) |
MONGO_USER | Username for MongoDB (optional) |
MONGO_PASS | Password for MongoDB (optional) |
MONGO_DB | Database name for MongoDB (optional) |
ELEVENLABS_PRICE | Monthly Price of Elevenlabs Subscription (optional) |
FFMPEG_ENABLED | Bool for if you have ffmpeg installed. (FFMPEG IS REQUIRED) |
⚠️ Might not work without an SSL connection. Has not been tested.
- Run AI-Twitch-TTS using the command below:
- Logging mode is optional. Options: info, debug, fountain
$ ./AI-Twitch-TTS <port> <logging-mode>
or
$ AI-Twitch-TTS.exe <port> <logging-mode>
- Add this to your OBS as a browser source
http(s)://$SERVER_URL/?channel=<username>
- Generate TTS by accessing this URL either through a browser or a Twitch chat bot (voice is optional):
- See Advanced Usage to see how to use multiple voices and effects in one message.
http(s)://$SERVER_URL/tts?channel=<username>&key=$TTS_KEY&voice=<voicename>&text=<text to generate>
⚠️ Might not work without an SSL connection. Has not been tested.
- Add this to your OBS as a browser source
http(s)://$SERVER_URL/?channel=<username>
- Generate TTS by accessing this URL either through a browser or a Twitch chat bot (voice is optional):
- See Advanced Usage to see how to use multiple voices and effects in one message.
http(s)://$SERVER_URL/tts?channel=<username>&key=$TTS_KEY&voice=<voicename>&text=<text to generate>
[v-voicename] is a voice tag meaning any text written after it will be spoken with that voice.
[e-effectname] is an effect tag which will play an effect.
(reverb) adds reverb to a TTS message.
If you use a tag in a message you MUST use voice tags for all the text you want to say.
✅[v-voice] this is text and then an effect [e-effect]
❌this text has no voice tag [e-effect]
✅[v-voice] this is text and then an effect [e-effect] [v-voice] and then more text
❌[v-voice] this is text and then an effect [e-effect] this text has no voice tag
✅[e-effect] [v-voice] this is text
❌[e-effect] this text has no voice tag
Example of reverb:
(reverb) this is reverbed text
[v-voice] (reverb) this is reverbed text with a specific voice.
This project is protected under the MIT License. For more details, refer to the LICENSE file.