Skip to content

Latest commit

 

History

History
47 lines (33 loc) · 2 KB

README.md

File metadata and controls

47 lines (33 loc) · 2 KB

Agent_smith

Overview

This is the pipeline based on Link paper. It first creates a vector embedding from your voice input using the encoder model. This vector is used to parameterize the output of the melspectrogram which is generated using the synthesizer model. The text you provide is used as the input for the mel spectrogram. After the final spectrogram is created a vocoder model is used to convert it back to .flac sound file.

Architecture

Output

Check out the output.wav for results

Setup

1. Install Requirements

Python 3.6 or 3.7 is needed to run the toolbox.

A) Install PyTorch (>=1.0.1). (Install first) B) Install ffmpeg. (Install next) C) Run pip install -r requirements.txt to install the remaining necessary packages.

2. Download Pretrained Models

Download the latest here.

The .zip file contains Encoder, Vocoder and Synthesizer models. Open each directory and copy the "saved_models" and paste them in them inside the original Encoder, Vocoder and Synthesizer file.

pretrained.zip/ Encoder/saved_models Vocoder/saved_models Synthesizer/saved_models

To be copied to

Agent_smith/ Encoder/ Vocoder/ Synthesizer/

Run

  • Run python run.py
  • It will run a test first to make sure that the setup has been done correctly
  • The prompt will ask you to provide the path to the input voice file, for easier access try.wav in samples directory has been provided.
  • Next you can input any sentence you wish to be enunciated as the output
  • It automatically will save the output file in the parent directory

FOR WINDOWS USERS: If you are using windows, packages like pyaudio and ffmpeg are known to cause errors. To fix this use run.py with --no_mp3_support. Run python run.py --no_mp3_support along with this, the input file must be in .WAV format only.