Skip to content

Latest commit

 

History

History
45 lines (38 loc) · 2.37 KB

README.md

File metadata and controls

45 lines (38 loc) · 2.37 KB

TransVisio

image

💡What does it do?

TransVisio is a tool designed to transcribe and translate text from various input formats, such as video and audio. It utilizes a pipeline of AI models to seamlessly transcribe and translate text in real-time.

Transcription Models Supported

  • Whisper 20231117 (Online)
  • Faster-Whisper v1.0.3 (Offline)

Translation Models Supported

  • GPT 4o
  • GPT 4 Turbo
  • GPT 3 Turbo
  • Gemini 1.5 Pro
  • Gemini 1.5 Flash

Features

  • Inputs supported:
Subtitle files (*.srt *.ass *.ssa). 
Video files (*.mp4 *.mkv *.webm *.flv *.avi *.mov *.wmv *.m4v).
Audio files (*.wav *.ogg *.mp3 *.aac *.flac *.m4a *.oga *.opus). 
Excel files (*.xlsx *.csv).
  • Input Sentences: Specify the number of input sentences to translate at once (higher = more context).
  • Pause/Resume: Control translation process at any point.
  • Reverse Translation: Switch the direction of the translated output for easier editing in certain languages.
  • Edit and Align: Remove or edit the translated output and input with automatic row alignment.
  • Specify Start Time and Duration: Set the starting point and the length of the transcription for video and audio inputs.
  • Temperature Setting: Control the randomness of the translation model’s output. Lower values make the output more deterministic, while higher values increase diversity.
  • Save Transcriptions: Export video and audio transcriptions to Excel.
  • Themes: Choose between light and dark themes.

Demo

Usage Notes:
Specify the Start Time and Duration before selecting the video or audio input.
Online Whisper requires an API key and is limited to a 25 MB input size.
Offline Whisper does not require a key but requires downloading a model (e.g., tiny, small) on the first use.

Animation

Disclaimer

  • TransVisio is part of a collaborative research funded by the Abdul Hameed Shoman Foundation (Agreement Number: 230800351).
  • Hosting Institution: The project is hosted by the English Language and Translation Department at the Applied Science Private University.