Skip to content

Code for customized implementation of stacked GAN model, used for text & audio to image synthesis.

Notifications You must be signed in to change notification settings

himanshu-dutta/RemixArt

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

RemixArt

Regardless of whether it's a book cover, album art or only template for a simple project, we are continually searching for designs over the web. Indeed, even with some prefix in our mind, we generally don't discover what we need. Our project aims at developing a model that takes inputs in the form of text, images and even audio and attempts to produce a picture or work of art, maybe, as photorealistic as could be expected under the circumstances. To illustrate, we work with a dataset of songs, album covers, artist images, and song lyrics to generate a close-to-real artwork. The idea can then be put to use in various domains also, where a lot of information in various formats are available. We use GANs with alterations made to incorporate inputs from three unique channels, and with that, we train it to learn embedding based on every one of the three distinct channels.

Adding the notes and changes related to the project, to keep track of it.

Generating Album Art using 3 2 channels of input, Audio, Images and/or Text.

Himanshu's:

  • Model Selection and Learning to Apply It
  • Pytorch
  • Applying the model in Pytorch
  • Figuring Out How to Actually Transfer the Workflow to GCloud

Archita's:

  • Data Scrapping and Collection
  • Deciding on the Data Source
  • Storage and Retrival for efficinet processing, locally or over cloud buckets.
  • Choice of Databse that would work well with the project.

Citation:

We leveraged the architecture of Stack GAN model in pytorch, with updates to the recent version of it, made fair share of modifications in terms of both, the procedure the original model followed along with the changes made to the conditional augmentation technique as well as embedding representation, with a vanilla model consisting of one dense layer and relu unit.

@inproceedings{han2017stackgan,
Author = {Han Zhang and Tao Xu and Hongsheng Li and Shaoting Zhang and Xiaogang Wang and Xiaolei Huang and Dimitris Metaxas},
Title = {StackGAN: Text to Photo-realistic Image Synthesis with Stacked Generative Adversarial Networks},
Year = {2017},
booktitle = {{ICCV}},
}

About

Code for customized implementation of stacked GAN model, used for text & audio to image synthesis.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published