This is a demo of a LiveKit agent that connects to a live streaming session and automatically transcribes and translates the host's speech into text captions for a target language. Every listener that connects to the session can set their preferred language and receive live captions of the host in that language.
It uses:
- 🌐 LiveKit for transport
- 🤖 LiveKit Agents for the backend
- 👂 Deepgram STT for transcriptions
- 🌍 OpenAI GPT-4o for translations
Here's what's happening in this demo:
- When a new LiveKit room is created via a user joining a "party", an agent joins the party on the backend and subscribes to the host user's microphone stream. If no host is present, the agent will wait for one to arrive and subscribe to their mic stream.
- When the host speaks, the agent receives their speech stream and runs it through a speech-to-text process to transcribe it to text. This demo currently uses Deepgram for transcriptions, but any STT provider can be used.
- By default, every user's (including the host's) target language for captions is set to English. Thus, transcriptions coming out of STT will be sent to every user via STTForwarder.
- If there are any users (including the host) connected to this session that have set their target language to a language other than English (currently the demo supports English, French, German, Spanish, and Japanese), the agent will additionally feed transcriptions coming from STT to a Translator for that target language.
- The translator will take the text from STT and pass it as part of a prompt to an LLM, asking the LLM to translate the text to the target language.
- The output from the LLM is then sent to users via STTForwarder and rendered by the client application.
cd server
python -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
cp .env.example .env
- add values for keys in
.env
python main.py dev
cd client/web
pnpm i
cp .env.example .env.local
- add values for keys in
.env.local
pnpm dev
- open a browser and navigate to
http://localhost:3000
- For this demo, there can only be one host.
- There's a couple known bugs at the moment:
- Sometimes joining as a listener ends up showing the agent as the host and things look broken. A refresh and rejoin should fix it.
- Opening more than one browser window and connecting a host and one-or-more listeners somehow degrades STT performance. Not sure why yet.
- You can easily extend this demo to support other languages by editing the list of languages in the agent code.
For a more general overview of LiveKit Agents and the full set of capabilities, documentation is here: https://docs.livekit.io/agents/