Skip to content

💬 ASR FastAPI server using faster-whisper and pyannote-audio.

License

Notifications You must be signed in to change notification settings

corystevens/wordcab-transcribe

 
 

Repository files navigation

Wordcab Transcribe 💬

FastAPI based API for transcribing audio files using faster-whisper and pyannote-audio

More details on this project on this blog post.

Key features

  • 🤗 Open-source: Our project is open-source and based on open-source libraries, allowing you to customize and extend it as needed.
  • ⚡ Fast: The faster-whisper library and CTranslate2 make audio processing incredibly fast compared to other implementations.
  • 🐳 Easy to deploy: You can deploy the project on your workstation or in the cloud using Docker.
  • 🔥 Batch requests: You can transcribe multiple audio files at once because batch requests are implemented in the API.
  • 💸 Cost-effective: As an open-source solution, you won't have to pay for costly ASR platforms.
  • 🫶 Easy-to-use API: With just a few lines of code, you can use the API to transcribe audio files or even YouTube videos.

Requirements

  • Linux (tested on Ubuntu Server 22.04)
  • Python 3.9
  • Docker
  • NVIDIA GPU + NVIDIA Container Toolkit

To learn more about the prerequisites to run the API, check out the Prerequisites section of the blog post.

Docker commands

Build the image.

docker build -t wordcab-transcribe:latest .

Run the container.

docker run -d --name wordcab-transcribe \
    --gpus all \
    --shm-size 1g \
    --restart unless-stopped \
    -p 5001:5001 \
    wordcab-transcribe:latest

Test the API

Once the container is running, you can test the API.

The API documentation is available at http://localhost:5001/docs.

Using CURL

  • Audio file:
curl -X 'POST' \
  'http://localhost:5001/api/v1/audio' \
  -H 'accept: application/json' \
  -H 'Content-Type: multipart/form-data' \
  -F 'file=@/path/to/audio/file.wav'
  • YouTube video:
curl -X 'POST' \
  'http://localhost:5001/api/v1/youtube' \
  -H 'accept: application/json' \
  -H 'Content-Type: application/json' \
  -d '{
  "url": "https://youtu.be/dQw4w9WgXcQ"
}'

Using Python

  • Audio file:
import requests

filepath = "/path/to/audio/file.wav"  # or mp3
files = {"file": open(filepath, "rb")}
response = requests.post("http://localhost:5001/api/v1/audio", files=files)
print(response.json())
  • YouTube video:
import requests

url = "https://youtu.be/dQw4w9WgXcQ"
data = {"url": url}
response = requests.post("http://localhost:5001/api/v1/youtube", json=data)
print(response.json())

Local testing

Before launching the API, be sure to install torch and torchaudio on your machine.

pip install --upgrade torch==1.13.1+cu117 torchaudio==0.13.1 --extra-index-url https://download.pytorch.org/whl/cu117

Then, you can launch the API using the following command.

poetry run uvicorn wordcab_transcribe.main:app --reload

🚀 Contributing

Getting started

  1. Clone the repo
git clone
cd wordcab-ask
  1. Install dependencies and start coding
poetry install
poetry shell

# install pre-commit hooks
nox --session=pre-commit -- install

# open your IDE
code .
  1. Run tests
# run all tests
nox

# run a specific session
nox --session=tests  # run tests
nox --session=pre-commit  # run pre-commit hooks

# run a specific test
nox --session=tests -- -k test_something

Working workflow

  1. Create an issue for the feature or bug you want to work on.
  2. Create a branch using the left panel on GitHub.
  3. git fetchand git checkout the branch.
  4. Make changes and commit.
  5. Push the branch to GitHub.
  6. Create a pull request and ask for review.
  7. Merge the pull request when it's approved and CI passes.
  8. Delete the branch.
  9. Update your local repo with git fetch and git pull.

GitHub Workflow

About

💬 ASR FastAPI server using faster-whisper and pyannote-audio.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 98.5%
  • Dockerfile 1.5%