LLaMA-Omni is a low-latency and high-quality end-to-end speech interaction model built upon Llama-3.1-8B-Instruct, aiming to achieve speech capabilities at the GPT-4o level.
-
Updated
Nov 14, 2024 - Python
LLaMA-Omni is a low-latency and high-quality end-to-end speech interaction model built upon Llama-3.1-8B-Instruct, aiming to achieve speech capabilities at the GPT-4o level.
A simple, high-quality voice conversion tool focused on ease of use and performance
High-quality and streaming Speech-to-Speech interactive agent in a single file. 只用一个文件实现的流式全双工语音交互原型智能体!
A desktop application that uses AI to translate voice between languages in real time, while preserving the speaker's tone and emotion.
MooER: Moore-threads Open Omni model for speech-to-speech intERaction. MooER-omni includes a series of end-to-end speech interaction models along with training and inference code, covering but not limited to end-to-end speech interaction, end-to-end speech translation and speech recognition.
✨✨Freeze-Omni: A Smart and Low Latency Speech-to-speech Dialogue Model with Frozen LLM
Code for NeurIPS 2023 paper "DASpeech: Directed Acyclic Transformer for Fast and High-quality Speech-to-Speech Translation".
If you've ever had the wish to talk to your AI Waifu using quality characters and voices for character voicing, then I suggest Soul of Waifu. Don't miss the opportunity to touch your dream!
💬 "Realtime" voice transcription and cloning using ElevenLabs's API.
svelte component for using the openai realtime api
Code for the INTERSPEECH 2023 paper "Learning When to Speak: Latency and Quality Trade-offs for Simultaneous Speech-to-Speech Translation with Offline Models"
Speech to text to speech using Elevenlabs
Swift SDK for Prototyping AI Speech Generation
Chatter Box is an android app that is capable of Voice, Text, Image Text Translation, and end-to-end chat translation.
A user-friendly interface for ElevenLabs' API with added audio transcription capability.
End-to-End AI Voice Assistant pipeline with Whisper for Speech-to-Text, Hugging Face LLM for response generation, and Edge-TTS for Text-to-Speech. Features include Voice Activity Detection (VAD), tunable parameters for pitch, gender, and speed, and real-time response with latency optimization.
Systems submitted to IWSLT 2022 by the MT-UPC group.
Speech-to-Speech translation dataset for German and English (text and speech quadruplets).
CtrlSpeak is a voice assistant activated with [Control]+Q, listening and responding only when you want.
Add a description, image, and links to the speech-to-speech topic page so that developers can more easily learn about it.
To associate your repository with the speech-to-speech topic, visit your repo's landing page and select "manage topics."