Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create dall e #87

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
124 changes: 124 additions & 0 deletions dall e
Original file line number Diff line number Diff line change
@@ -0,0 +1,124 @@
# Model Card: DALL·E dVAE

Following [Model Cards for Model Reporting (Mitchell et al.)](https://arxiv.org/abs/1810.03993) and [Lessons from
Archives (Jo & Gebru)](https://arxiv.org/pdf/1912.10389.pdf), we're providing some information about about the discrete
VAE (dVAE) that was used to train DALL·E.

## Model Details

The dVAE was developed by researchers at OpenAI to reduce the memory footprint of the transformer trained on the
text-to-image generation task. The details involved in training the dVAE are described in [the paper][dalle_paper]. This
model card describes the first version of the model, released in February 2021. The model consists of a convolutional
encoder and decoder whose architectures are described [here](dall_e/encoder.py) and [here](dall_e/decoder.py), respectively.
For questions or comments about the models or the code release, please file a Github issue.

## Model Use

### Intended Use

The model is intended for others to use for training their own generative models.

### Out-of-Scope Use Cases

This model is inappropriate for high-fidelity image processing applications. We also do not recommend its use as a
general-purpose image compressor.

## Training Data

The model was trained on publicly available text-image pairs collected from the internet. This data consists partly of
[Conceptual Captions][cc] and a filtered subset of [YFCC100M][yfcc100m]. We used a subset of the filters described in
[Sharma et al.][cc_paper] to construct this dataset; further details are described in [our paper][dalle_paper]. We will
not be releasing the dataset.

## Performance and Limitations

The heavy compression from the encoding process results in a noticeable loss of detail in the reconstructed images. This
renders it inappropriate for applications that require fine-grained details of the image to be preserved.

[dalle_paper]: https://arxiv.org/abs/2102.12092
[cc]: https://ai.google.com/research/ConceptualCaptions
[cc_paper]: https://www.aclweb.org/anthology/P18-1238/
[yfcc100m]: http://projects.dfki.uni-kl.de/yfcc100m/

# Overview

[[Blog]](https://openai.com/blog/dall-e/) [[Paper]](https://arxiv.org/abs/2102.12092) [[Model Card]](model_card.md) [[Usage]](notebooks/usage.ipynb)

This is the official PyTorch package for the discrete VAE used for DALL·E. The transformer used to generate the images from the text is not part of this code release.

# Installation

Before running [the example notebook](notebooks/usage.ipynb), you will need to install the package using

pip install DALL-E

Pillow
blobfile
mypy
numpy
pytest
requests
torch
torchvision

from setuptools import setup

def parse_requirements(filename):
lines = (line.strip() for line in open(filename))
return [line for line in lines if line and not line.startswith("#")]

setup(name='DALL-E',
version='0.1',
description='PyTorch package for the discrete VAE used for DALL·E.',
url='http://github.com/openai/DALL-E',
author='Aditya Ramesh',
author_email='[email protected]',
license='BSD',
packages=['dall_e'],
install_requires=parse_requirements('requirements.txt'),
zip_safe=True)

# OS specific
*.DS_Store

# Python
/build
/dist
__pycache__
*.ipynb_checkpoints
*.egg-info

# Vim
*.vim
*.swk
*.swl
*.swm
*.swn
*.swo
*.swp

Modified MIT License

Software Copyright (c) 2021 OpenAI

We don’t claim ownership of the content you create with the DALL-E discrete VAE, so it is yours to
do with as you please. We only ask that you use the model responsibly and clearly indicate that it
was used.

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and
associated documentation files (the "Software"), to deal in the Software without restriction,
including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense,
and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so,
subject to the following conditions:

The above copyright notice and this permission notice shall be included
in all copies or substantial portions of the Software.
The above copyright notice and this permission notice need not be included
with content created by the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED,
INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT,
TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE
OR OTHER DEALINGS IN THE SOFTWARE.