Skip to content

Latest commit

 

History

History
22 lines (14 loc) · 2.26 KB

README.md

File metadata and controls

22 lines (14 loc) · 2.26 KB

Voice Conversion using Deep Learning

Download PDF

This project will be carried out at the Signal Theory and Communications Department (TSC) of the Polytechnic University of Catalonia (UPC). Specifically, it will be developed at the Speech Processing investigation group (VEU) as a contribution to its research project DeepVoice: Deep Learning Technologies for Speech and Audio Processing.

The purpose of this project is to develop a deep learning-based system able to convert a voice signal from a speaker into another that sounds as if it were uttered by a different one. The result signal must keep the linguistic and prosodic elements of the original signal unmodified.

Deep Learning techniques have shown remarkable results in other areas of speech processing, such as voice recognition and voice synthesis. These techniques are often combined with other, more classic, techniques of voice processing and modeling, such as feature extractions from a vocoder. These techniques are used for pre and post-processing purposes.

Before this system can be developed, there are some previous tasks that must be accomplished. Mainly, these tasks comprise acquiring a thorough knowledge of Neural Networks and how to apply them in Deep Learning, as well as getting familiarized with the tools that will be used in the project. These tools include several Python libraries, such as NumPy, TensorFlow, Theano and Keras.

Regarding the programming tools and libraries, some preparation work has already been done beforehand during summer 2016, working with Python, NumPy and TensorFlow.

The project’s main goals are:

  1. Develop a Deep Learning-based system able to convert recorded speech from a speaker into that of another speaker
    1. Profound understanding of Deep Learning architectures
    2. Solid knowledge in the use of the Keras Deep Learning Python library
    3. Propose an innovative architecture following the state of the art in Deep Learning for Voice Conversion
    4. Evaluate the developed system’s conversion so it performs better than those submitted to Interspeech 2016 Voice Conversion Challenge