Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add whisper pipeline - Initial commit #789

Merged
merged 53 commits into from
Sep 19, 2024

Conversation

as-suvorov
Copy link
Contributor

@as-suvorov as-suvorov commented Aug 21, 2024

This is work in progress PR. Todos:

  • use WhisperFeatureExtractor for audio preprocessing
  • compute assets/whisper/mel_filters_data.bin on initialization
  • move wav reader to sample utils
  • Longer audio inputs (>30s) chunking border poor quality results. Long audio inputs splitted by 30s chunks. This leads to a loss of context on a chunking border. This could be partially solved by chunking with stride.
  • add perf metrics
  • update docstrings
  • update documentation
  • add python bindings
  • add tests
  • add cpp, python samples tests
  • fix win build
  • fetch dr_wav.h with FetchContent
  • support different languages, language autodetection
  • support translation
  • support timestamps
  • remove constructor with infer requests
  • rename pipeline to WhisperPipeline
  • Whisper pipeline doesn't need tokenizer, it uses detokenizer only. Implement detokenizer only initialization for ov::genai::Tokenizer
  • Check discrete GPU. Integrated GPU works as expected.
  • Investigate use of RemoteTensor for GPU
  • Add batch
  • Add sampler, inherit WhisperGenerationConfig from GenerationConfig

Current limitations:

  • No resampling during preprocessing. Input raw speech should have 16k Hz sampling rate
  • No normalization during preprocessing. Input raw speech should be normalized to near [-1, 1] range

Tickets: CVS-147994, CVS-146010, CVS-152522

src/cpp/include/openvino/genai/audio_utils.hpp Outdated Show resolved Hide resolved
src/cpp/include/openvino/genai/dr_wav.h Outdated Show resolved Hide resolved
src/cpp/src/whisper/audio_processing.cpp Outdated Show resolved Hide resolved
src/cpp/src/whisper/audio_processing.cpp Outdated Show resolved Hide resolved
src/cpp/src/whisper/whisper.cpp Outdated Show resolved Hide resolved
src/cpp/src/whisper/whisper.cpp Show resolved Hide resolved
src/cpp/src/whisper/whisper_models.hpp Outdated Show resolved Hide resolved
throw std::runtime_error("Failed to read WAV file " + wav_file_path);
}

ov::genai::WhisperSpeechRecognitionPipeline pipeline{model_path};
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does GPU work?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We checked with @yatarkan integrated GPU on his machine, it works fine. Trying to check on discrete one.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's check separately

samples/cpp/whisper_speech_recognition/audio_utils.hpp Outdated Show resolved Hide resolved
samples/cpp/whisper_speech_recognition/audio_utils.hpp Outdated Show resolved Hide resolved
samples/cpp/whisper_speech_recognition/CMakeLists.txt Outdated Show resolved Hide resolved
samples/cpp/whisper_speech_recognition/CMakeLists.txt Outdated Show resolved Hide resolved
samples/cpp/whisper_speech_recognition/audio_utils.cpp Outdated Show resolved Hide resolved
src/cpp/src/whisper/whisper.cpp Outdated Show resolved Hide resolved
src/cpp/src/whisper/whisper_feature_extractor.hpp Outdated Show resolved Hide resolved
src/cpp/src/whisper/whisper_feature_extractor.hpp Outdated Show resolved Hide resolved
src/python/py_generate_pipeline.cpp Outdated Show resolved Hide resolved
@ilya-lavrenov ilya-lavrenov self-assigned this Sep 3, 2024
@andrei-kochin andrei-kochin changed the title Add whisper pipeline Add whisper pipeline - Initial commit Sep 12, 2024
@as-suvorov as-suvorov marked this pull request as ready for review September 13, 2024 14:02
src/cpp/src/whisper/whisper.cpp Show resolved Hide resolved
m_impl->m_generation_config.validate();
}

ov::genai::WhisperPipeline::~WhisperPipeline() = default;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need it at all? I think you can omit dtor definition

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've tried to remove it but I got compilation error. As I understood use of std::unique_ptr<Impl> class member requires class to have a destructor: https://stackoverflow.com/questions/34072862/why-is-error-invalid-application-of-sizeof-to-an-incomplete-type-using-uniqu
I guess there is a way to implement it without need of WhisperPipeline explicit destructor, please let me know if I need to investigate.

Error:

[build] /usr/include/c++/11/bits/unique_ptr.h: In instantiation of ‘void std::default_delete<_Tp>::operator()(_Tp*) const [with _Tp = ov::genai::WhisperPipeline::Impl]’:
[build] /usr/include/c++/11/bits/unique_ptr.h:361:17:   required from ‘std::unique_ptr<_Tp, _Dp>::~unique_ptr() [with _Tp = ov::genai::WhisperPipeline::Impl; _Dp = std::default_delete<ov::genai::WhisperPipeline::Impl>]’
[build] /opt/home/suvorova/projects/openvino/openvino.genai/src/cpp/include/openvino/genai/whisper_pipeline.hpp:20:30:   required from here

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe we can use shared_ptr then?

@@ -11,7 +11,10 @@
#include <openvino/runtime/auto/properties.hpp>
#include "../cpp/src/tokenizers_path.hpp"

#include "./utils.hpp"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
#include "./utils.hpp"
#include "utils.hpp"

just add target_include dirs to current folder in cmake

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Naively adding target_include_dirs didn't work. It compiles successfully, but python code couldn't find symbol in utils.cpp. I'll add todo to address that.

pybind11_add_module(py_generate_pipeline py_generate_pipeline.cpp)
target_include_directories(py_generate_pipeline PUBLIC "${CMAKE_CURRENT_SOURCE_DIR}/")

@ilya-lavrenov ilya-lavrenov added this pull request to the merge queue Sep 19, 2024
Merged via the queue into openvinotoolkit:master with commit 7b81bcb Sep 19, 2024
46 checks passed
.github/workflows/windows.yml Show resolved Hide resolved
run: |
. "${{ env.OV_INSTALL_DIR }}/setupvars.ps1"
python -m pip install . --verbose
python -m pytest ./tests/python_tests/test_whisper_generate_api.py
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we test only via either wheel or C++ archive?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I followed existing pattern here. I guess we can run just some single test to ensure both (wheel and C++ archive) approaches work and not the whole test suite. @Wovchena what's your thoughts on this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants