FastAPI based API for transcribing audio files using faster-whisper
and pyannote-audio
More details on this project on this blog post.
- 🤗 Open-source: Our project is open-source and based on open-source libraries, allowing you to customize and extend it as needed.
- ⚡ Fast: The faster-whisper library and CTranslate2 make audio processing incredibly fast compared to other implementations.
- 🐳 Easy to deploy: You can deploy the project on your workstation or in the cloud using Docker.
- 🔥 Batch requests: You can transcribe multiple audio files at once because batch requests are implemented in the API.
- 💸 Cost-effective: As an open-source solution, you won't have to pay for costly ASR platforms.
- 🫶 Easy-to-use API: With just a few lines of code, you can use the API to transcribe audio files or even YouTube videos.
- Linux (tested on Ubuntu Server 22.04)
- Python 3.9
- Docker
- NVIDIA GPU + NVIDIA Container Toolkit
To learn more about the prerequisites to run the API, check out the Prerequisites section of the blog post.
Build the image.
docker build -t wordcab-transcribe:latest .
Run the container.
docker run -d --name wordcab-transcribe \
--gpus all \
--shm-size 1g \
--restart unless-stopped \
-p 5001:5001 \
wordcab-transcribe:latest
Once the container is running, you can test the API.
The API documentation is available at http://localhost:5001/docs.
- Audio file:
curl -X 'POST' \
'http://localhost:5001/api/v1/audio' \
-H 'accept: application/json' \
-H 'Content-Type: multipart/form-data' \
-F 'file=@/path/to/audio/file.wav'
- YouTube video:
curl -X 'POST' \
'http://localhost:5001/api/v1/youtube' \
-H 'accept: application/json' \
-H 'Content-Type: application/json' \
-d '{
"url": "https://youtu.be/dQw4w9WgXcQ"
}'
- Audio file:
import requests
filepath = "/path/to/audio/file.wav" # or mp3
files = {"file": open(filepath, "rb")}
response = requests.post("http://localhost:5001/api/v1/audio", files=files)
print(response.json())
- YouTube video:
import requests
url = "https://youtu.be/dQw4w9WgXcQ"
data = {"url": url}
response = requests.post("http://localhost:5001/api/v1/youtube", json=data)
print(response.json())
Before launching the API, be sure to install torch and torchaudio on your machine.
pip install --upgrade torch==1.13.1+cu117 torchaudio==0.13.1 --extra-index-url https://download.pytorch.org/whl/cu117
Then, you can launch the API using the following command.
poetry run uvicorn wordcab_transcribe.main:app --reload
- Clone the repo
git clone
cd wordcab-ask
- Install dependencies and start coding
poetry install
poetry shell
# install pre-commit hooks
nox --session=pre-commit -- install
# open your IDE
code .
- Run tests
# run all tests
nox
# run a specific session
nox --session=tests # run tests
nox --session=pre-commit # run pre-commit hooks
# run a specific test
nox --session=tests -- -k test_something
- Create an issue for the feature or bug you want to work on.
- Create a branch using the left panel on GitHub.
git fetch
andgit checkout
the branch.- Make changes and commit.
- Push the branch to GitHub.
- Create a pull request and ask for review.
- Merge the pull request when it's approved and CI passes.
- Delete the branch.
- Update your local repo with
git fetch
andgit pull
.