Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Whisper transcription app big performance regression #712

Open
thundergolfer opened this issue Apr 20, 2024 · 9 comments
Open

Whisper transcription app big performance regression #712

thundergolfer opened this issue Apr 20, 2024 · 9 comments

Comments

@thundergolfer
Copy link
Collaborator

https://modal-com.slack.com/archives/C069RAH7X4M/p1713624663717089

image
@thundergolfer
Copy link
Collaborator Author

A one hour podcast used to take ~1 minute, so big drop in performance.

@thundergolfer
Copy link
Collaborator Author

I think first thing to do is to replace the use of NFS

@ahxxm
Copy link

ahxxm commented May 18, 2024

would be great if the official example uses WhisperX, it can transcribe one hour podcast in 1 minute using only 1 container(or more specific, 1 graphi card with 16G vram, using large-v3), instead of spins up 100-300 containers for a single transcription

@ahxxm
Copy link

ahxxm commented May 18, 2024

made a poc repo here https://github.com/ahxxm/serverless-audio-transcriber

@ahxxm
Copy link

ahxxm commented Oct 5, 2024

I've been using my own dogfood for a while, this is how it looks like with A10G

The audio files I sent range from 30 minutes to 70 minutes, and I scheduled a 5 minute interval

{06617E04-3793-49C8-968B-86164686CABE}

This translates to around $0.01~$0.02 per hour of transcription, wonder how does the current official approch look like? Before and after regression, will it be cheaper and faster?

@thundergolfer
Copy link
Collaborator Author

thundergolfer commented Oct 5, 2024

Thanks @ahxxm this is awesome, especially the bit where you've listed benchmarks. It looks like Runpod is the cheapest? I'd bet that the
RTX A4500 $/hr rate is cheaper than our A10G.

@ahxxm
Copy link

ahxxm commented Oct 5, 2024

Yeah A4500 is a new, performant(for Whisper) and cheap(Modal T4 price) one, would be great if Modal also supports that.

I think it's comparable but not a completely fair comparison, Runpod has FlashBoot that loads weights within 1 second instead of 20+ seconds, so I added a bit more CPU and memory to Modal codes

@gongouveia
Copy link

@thundergolfer @ahxxm hello, thank your for this very informative thread. I would like to ask how could I use my own model using modal, how can i send my file to the container image?

@ahxxm
Copy link

ahxxm commented Nov 9, 2024

@gongouveia I doubt this is relevant to the issue, but are you asking about how to send custom model/weights into Docker image, or send the file you want to transcribe(see all examples in this repo or mine then)?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants