This document describes how to run the benchmarks for WhisperKit. The benchmarks can be run on a specific device or all connected devices. The results are saved in JSON files and can be uploaded to the argmaxinc/whisperkit-evals-dataset dataset on HuggingFace as a pull request. Below are the steps to run the benchmarks locally in order to reproduce the results shown in our WhisperKit Benchmarks space.
To download the code to run the test suite, run:
git clone [email protected]:argmaxinc/WhisperKit.git
Before running the benchmarks, you'll need to set up your local environment with the necessary dependencies. To do this, run:
make setup
See Contributing for more information.
When running the tests, the model to test needs is provided to the Xcode from Fastlane as an environment variable:
- Open the example project:
xed Examples/WhisperAX
-
At the top, you will see the app icon and
WhisperAX
written next to it. Click onWhisperAX
and selectEdit Scheme
at the bottom. -
Under
Environment Variables
, you will see an entry withMODEL_NAME
as the name and$(MODEL_NAME)
as the value.
Important
An active developer account is required to run the tests on physical devices.
Before running tests, all external devices need to be connected and paired to your Mac, as well as registered with your developer account. Ensure the devices are in Developer Mode. If nothing appears after connecting the devices via cable, press Command + Shift + 2
to open the list of devices and track their progress.
The datasets for the test suite can be set in a global array called datasets
in the file Tests/WhisperKitTests/RegressionTests.swift
. It is prefilled with the datasets that are currently available.
The models for the test suite can be set in the Fastfile
. Simply find BENCHMARK_CONFIGS
and modify the models
array under the benchmark you want to run.
The tests are run using Fastlane, which is controlled by a Makefile. The Makefile contains the following commands:
Before running the tests it might be a good idea to list the connected devices to resolve any connection issues. Simply run:
make list-devices
The output will be a list with entries that look something like this:
{
:name=>"My Mac",
:type=>"Apple M2 Pro",
:platform=>"macOS",
:os_version=>"15.0.1",
:product=>"Mac14,12",
:id=>"XXXXXXXX-1234-5678-9012-XXXXXXXXXXXX",
:state=>"connected"
}
Verify that the devices are connected and the state is connected
.
After completing the above steps, you can run the tests. Note that there are two different test configurations: one named full
and the other named debug
. To check for potential errors, run the debug
tests:
make benchmark-devices DEBUG=true
Otherwise run the full
tests:
make benchmark-devices
Optionally, for both tests, you can specify the list of devices for the tests using the DEVICES
option:
make benchmark-devices DEVICES="iPhone 15 Pro Max,My Mac"
The DEVICES
option is a comma-separated list of device names. The device names can be found by running make list-devices
and using the value for the :name
key.
After the tests are run, the generated results can be found under fastlane/benchmark_data
including the .xcresult file with logs and attachments for each device. There will also be a folder called fastlane/upload_folder/benchmark_data
that contains only the JSON results in fastlane/benchmark_data
that can used for further analysis.
We will periodically run these tests on a range of devices and upload the results to the argmaxinc/whisperkit-evals-dataset, which will propagate to the WhisperKit Benchmarks space and be available for comparison.
If you encounter issues while running the tests, heres a few things to try:
- Open the project in Xcode and run the tests directly from there.
- To do this, open the example app (from command line type:
xed Examples/WhisperAX
) and run the test namedRegressionTests/testModelPerformanceWithDebugConfig
from the test navigator. - If the tests run successfully, you can rule out any issues with the device or the models.
- If they dont run successfully, Xcode will provide more detailed error messages.
- To do this, open the example app (from command line type:
- Try specifying a single device to run the tests on. This can be done by running
make list-devices
and then running the tests with theDEVICES
option set to the name of the device you want to test on. For example,make benchmark-devices DEVICES="My Mac"
. This will also enable you to see the logs for that specific device. - If you are still encountering issues, please reach out to us on the Discord or create an issue on GitHub.