Skip to content

Releases: Macoron/whisper.unity

1.3.2

03 Aug 07:49
fb21679
Compare
Choose a tag to compare

Minor release. Fixed Metal support on MacOS.

What's Changed

  • Update version string in package.json to 1.3.1 by @from2001 in #86
  • Use the new WHISPER_METAL_EMBED_LIBRARY flag to embed the metal lib by @injeniero in #93
  • Updated MacOS binaries (fix Metal support) by @Macoron in #94

New Contributors

Full Changelog: 1.3.1...1.3.2

1.3.1

09 May 08:48
30f9e11
Compare
Choose a tag to compare

New minor release. Includes update of whisper.cpp to 1.5.5 and bug fixes.

What's Changed

  • Fixed out of bounds exception during resampling by @Macoron in #74
  • Add visionOS support by @Macoron in #75
  • Added missing Accelerate framework by @Macoron in #76
  • Update README.md with VisionOS support by @yosun in #77
  • Updated whisper.cpp to 1.5.5 by @Macoron in #84

New Contributors

  • @yosun made their first contribution in #77

Full Changelog: 1.3.0...1.3.1

1.3.0 - GPU Support

30 Nov 21:54
25c8d26
Compare
Choose a tag to compare

This release introduce whisper.cpp update to 1.5.1, GPU inference support and other minor improvements.

Whisper.cpp updated to 1.5.1

whisper.cpp 1.5.1 got a lot of improvements and bug fixes including better GPU usage.

Check original release notes for more information.

GPU Support

Whisper now supports GPU acceleration. This can drastically improve performance for some hardware.

Model CPU CUDA
tiny 1188 ms 185 ms
small 8992 ms 517 ms
large-v2 60325 ms 1946 ms

Tests of "jfk.wav" transcribing on Windows with Intel Core i5-12400F and Nvidia Geforce RTX 2070 Super.

Model CPU Metal
tiny 1113 ms 189 ms
small 6319 ms 860 ms
large-v2 40608 ms 3888 ms

Tests of "jfk.wav" transcribing on Apple M1 Pro.

For Windows and Linux you would need Nvidia GPU and installed CUDA Toolkit (tested with 12.2.0). Unity project compiled with enabled CUDA expects your end-users to have Nvidia GPU and CUDA libraries. Trying to run build without it will result error.

For MacOS you would need ARM CPU, like M1 or newer. iOS Metal inference isn't yet supported. In case of Intel or older hardware, whisper.cpp should fallback to CPU inference.

To activate GPU inference, go to Project Settings => Whisper => Enable CUDA or Enable Metal. For more information, check README.

Other

Ubuntu libs now compiled on Ubuntu 20.04. This might cause problems with Ubuntu 18.04. If you need support for earlier versions of Ubuntu or other distros, consider recompiling libs from source.

New loop mode for microphone was added. It creates a new endless non-stopping stream using Unity build-in circular microphone loop. This is very useful for whisper streaming transcription. To activate it - set Loop in MicrophoneRecord to "true".

What's Changed

Full Changelog: 1.2.1...1.3.0

1.2.1

25 Aug 09:54
8725359
Compare
Choose a tag to compare

This release introduces VAD and some other minor improvements.

Voice Activity Detection (VAD)

Voice Activity Detection(VAD) was added to this project. It allows you to check if current audio has any speech detected. For example, you can finish microphone input when user stopped speaking.

output.mp4

Implementation of the VAD is very basic. It is direct port of energy-based VAD from whisper.cpp. Don't expect it to be very robust, but as a proof of concept it should work fine.

VAD Streaming

Now streaming supports VAD. This should drastically reduce hallucinations that was caused by silent audio regions.

output_novad.mp4

ggml.base.en, VAD disabled

output_vad.mp4

ggml.base.en, VAD enabled

What's Changed

Full Changelog: 1.2.0...1.2.1

1.2.0

25 Jul 21:05
36526a3
Compare
Choose a tag to compare

New major release with a lot of changes.

whisper.cpp updated to 1.4.2

While 1.4.2 is technically still in beta, it was available for several month and seems to be working stable. The quality of transcription shouldn't have changed, however some results looks different comparing to previous versions. If this is critical for you, consider using previous releases.

Prompting

image
Whisper.unity now supports prompting. Prompting helps you to "guide" transcription style, names or specific terminology. It isn't as powerful as prompting LLM, but you can get really interesting results with it.

Streaming

output.mp4

First version of transcription streaming was added. Now transcription will be updating in real-time, using microphone or audio stream. This is mostly direct port of original whisper.cpp demo except VAD.

What's Changed

New Contributors

Full Changelog: 1.1.1...1.2.0

1.1.1

04 Jun 10:58
9ffa632
Compare
Choose a tag to compare

Minor release. Add prebuild Linux binaries and Github Actions tests/builds.

What's Changed

Full Changelog: 1.1.0...1.1.1

1.1.0

29 Apr 14:36
cb7a5c3
Compare
Choose a tag to compare

This release adds timestamps and confidence data for segments and tokens. It changes signature of OnNewSegment event and WhisperResult class, so make sure to update your code if you used them.

What's Changed

Demos

image
Segments timestamps prediction

subtitles.mp4

whisper.tiny in subtitles demo, color shows confidence level for each token

Full Changelog: 1.0.3...1.1.0

1.0.3

21 Apr 18:08
ddec093
Compare
Choose a tag to compare

What's Changed

New Contributors

Language detection example

image

Full Changelog: 1.0.2...1.0.3

1.0.2

12 Apr 21:12
dce5dae
Compare
Choose a tag to compare

What's Changed

  • Add basic unit testing in #5
  • Text segments streaming in #6
  • Minor readme changes

Full Changelog: 1.0.1...1.0.2

Text segment streaming

text-streaming.mp4

1.0.1

08 Apr 10:24
c11254d
Compare
Choose a tag to compare

What's Changed

  • Expose more whisper parameters in #3
  • Faster Android inference in #4

Full Changelog: 1.0.0...1.0.1