Name		Name	Last commit message	Last commit date
parent directory ..
227-whisper-convert.ipynb		227-whisper-convert.ipynb
227-whisper-convert.png		227-whisper-convert.png
227-whisper-nncf-quantize.ipynb		227-whisper-nncf-quantize.ipynb
README.md		README.md
utils.py		utils.py

README.md

Video Subtitle Generation with OpenAI Whisper

Whisper is a general-purpose speech recognition model from OpenAI. The model is able to almost flawlessly transcribe speech across dozens of languages and even handle poor audio quality or excessive background noise. This notebook will run the model with OpenVINO to generate transcription of a video.

Notebook Contents

This notebook demonstrates how to generate video subtitles using the open-source Whisper model. Whisper is an automatic speech recognition (ASR) system trained on 680,000 hours of multilingual and multitask supervised data collected from the web. It is a multi-task model that can perform multilingual speech recognition as well as speech translation and language identification. You can find more information about this model in the research paper, OpenAI blog, model card and GitHub repository.

This folder contains two notebooks that show how to convert and quantize model with OpenVINO:

In these notebooks, you will use its capabilities for generation of subtitles for a video.

Convert Whisper model using OpenVINO

The first notebook contains the following steps:

Download the model.
Instantiate original PyTorch model pipeline.
Convert model to OpenVINO IR, using model conversion API.
Run the Whisper pipeline with OpenVINO.

A simplified demo pipeline is represented in the diagram below: The final output of running this notebook is an srt file (popular video captioning format) with subtitles for a sample video downloaded from YouTube. This file can be integrated with a video player during playback or embedded directly into a video file with ffmpeg or similar tools that support working with subtitles.

The image below shows an example of the video as input and corresponding transcription as output.

Quantize OpenVINO Whisper model using NNCF

The second notebook will guide you through steps of improving model performance by INT8 quantization with NNCF:

Quantize the converted OpenVINO model from 227-whisper-convert notebook with NNCF.
Check model result for the demo video.
Compare model size, performance and accuracy of FP32 and quantized INT8 models.

Installation Instructions

This is a self-contained example that relies solely on its own code.
We recommend running the notebook in a virtual environment. You only need a Jupyter server to start. For details, please refer to Installation Guide.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

227-whisper-subtitles-generation

227-whisper-subtitles-generation

README.md

Video Subtitle Generation with OpenAI Whisper

Notebook Contents

Convert Whisper model using OpenVINO

Quantize OpenVINO Whisper model using NNCF

Installation Instructions

Files

227-whisper-subtitles-generation

Directory actions

More options

Directory actions

More options

Latest commit

History

227-whisper-subtitles-generation

Folders and files

parent directory

README.md

Video Subtitle Generation with OpenAI Whisper

Notebook Contents

Convert Whisper model using OpenVINO

Quantize OpenVINO Whisper model using NNCF

Installation Instructions