WhisperKit Benchmarks #243
atiorh
announced in
Announcements
Replies: 1 comment
-
Note: Higher performance (speed) with WhisperKit is possible. However, the benchmark data represents using the recommended (default) configuration that best balances battery life, thermal sustainability, memory consumption and latency for a smooth user experience. For example, on M2 Ultra, WhisperKit runs the latest OpenAI Large V3 Turbo model (v20240930/turbo in WhisperKit) as fast as 72x real-time with a GPU+ANE config. However, the default config (ANE only) is published as 42x real-time on the benchmarks. M2_Ultra_large_v3_turbo.mov |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
We are thrilled to announce our comprehensive benchmark suite for WhisperKit!
Benchmarks (Hugging Face Space)
Detailed Announcement (Twitter)
The benchmarks will be updated with every release starting WhisperKit-0.9!
Performance (speed) is reported on long-form ("from file" proxy) and short-form ("streaming" proxy) audio. The test data used in benchmarks is published on Hugging Face and benchmarks are reproducible by following instructions in BENCHMARKS.md.
Quality is reported across 3 datasets and 77 languages using WER and other metrics. Speech-to-text as well as Language Detection tasks are evaluated.
Device Support data is also published so developers can build presets for WhisperKit to best fit each end-user device while maximizing speed and/or accuracy as much as possible. Raw data here.
Looking forward to the community feedback!
Beta Was this translation helpful? Give feedback.
All reactions