Skip to content

Latest commit

 

History

History
 
 

242-freevc-voice-conversion

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 

High-Quality Text-Free One-Shot Voice Conversion with FreeVC and OpenVINO™

FreeVC allows alter the voice of a source speaker to a target style, while keeping the linguistic content unchanged, without text annotation.

Figure bellow illustrates model architecture of FreeVC for inference. In this notebook we concentrate only on inference part. There are three main parts: Prior Encoder, Speaker Encoder and Decoder. The prior encoder contains a WavLM model, a bottleneck extractor and a normalizing flow. Detailed information is available in this paper.

Inference

*image_source

Notebook Contents

FreeVC suggests only command line interface to use and only with CUDA. In this notebook it shows how to use FreeVC in Python and without CUDA devices. It consists of the following steps:

  • Download and prepare models.
  • Inference.
  • Convert models to OpenVINO Intermediate Representation.
  • Inference using only OpenVINO's IR models.

Installation Instructions

If you have not installed all required dependencies, follow the Installation Guide.

Also, it requires some extra steps, that can be done manually or will be performed automatically during the execution of the notebook, but in minimum necessary scope.

  1. Clone this repo: git clone https://github.com/OlaWod/FreeVC.git.
  2. Download WavLM-Large and put it under FreeVC/wavlm/ directory.
  3. You can download the VCTK dataset. For this example we download only two of them from Hugging Face FreeVC example.
  4. Download pretrained models and put it under checkpoints directory (for current example only freevc.pth are required).