Show-and-Speak

This is the pytorch implement for our paper "SHOW AND SPEAK: DIRECTLY SYNTHESIZE SPOKEN DESCRIPTION OF IMAGES". More details can be seen in the project page.

Requirements

python 3.6
pytorch 1.4.0
scipy 1.2.1

train the code

Download database

You can download our processed database from Flickr8k_SAS. Then unzip the file in the root directory of the code. You can get the directory tree as:

├── Data_for_SAS
│   ├── bottom_up_features_36_info
│   ├── images
│   ├── mel_80
│   ├── wavs
│   ├── train
│   │   ├── filenames.pickle
│   ├── val
│   │   ├── filenames.pickle
│   ├── test
│   │   ├── filenames.pickle

Among them, "bottom_up_features_36_info" contains the extracted bottom-up features of images; "images" contains all raw images of Flickr8k; "mel_80" contains the mel spectrogram of audio files; "wavs" constains all the speech synthesized by TTS system.

Train the code

run

python train --data_dir Data_for_SAS --save_path outputs

Inference

Download the pre-trained waveglow model and put it in the root directory of this code.

run

python train --data_dir Data_for_SAS --save_path outputs --only_val

Cite

@article{wang2020show,
title={Show and Speak: Directly Synthesize Spoken Description of Images},
author={Xinsheng Wang, Siyuan Feng, Jihua Zhu, Mark Hasegawa-Johnson, Odette Scharenborg},
journal={arXiv preprint arXiv:arXiv:2010.12267},
year={2020}
}

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
__pycache__		__pycache__
common		common
model		model
text		text
utils		utils
wav2mel		wav2mel
waveglow		waveglow
.gitignore		.gitignore
README.md		README.md
glow.py		glow.py
glow_old.py		glow_old.py
hparams.py		hparams.py
inference.ipynb		inference.ipynb
mkgta.py		mkgta.py
requirements.txt		requirements.txt
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Show-and-Speak

Requirements

train the code

Download database

Train the code

Inference

Cite

About

Releases

Packages

Languages

xinshengwang/Show-and-Speak

Folders and files

Latest commit

History

Repository files navigation

Show-and-Speak

Requirements

train the code

Download database

Train the code

Inference

Cite

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages