GitHub - arbdigital/image_captioning: Tensorflow implementation of "Show, Attend and Tell: Neural Image Caption Generation with Visual Attention"

This is a neural network architecture for image captioning roughly based on the paper "Show, Attend and Tell: Neural Image Caption Generation with Visual Attention" by Xu et al. (ICML2015). The input is an image, and the output is a sentence describing the content of the image. It first uses a convolutional neural network to extract a feature vector of the input image, and then uses a LSTM recurrent neural network to decode this feature vector into a natural language sentence. A soft attention mechanism is incorporated to improve the quality of the caption.

This project is implemented in Tensorflow, and allows end-to-end training of both CNN and RNN parts. To use it, you will need the Tensorflow version of VGG16 or ResNet(50, 101, 152) model, which can be obtained by using Caffe-to-Tensorflow.

Examples

References

Show, Attend and Tell: Neural Image Caption Generation with Visual Attention. Kelvin Xu, Jimmy Ba, Ryan Kiros, Kyunghyun Cho, Aaron Courville, Ruslan Salakhutdinov, Richard Zemel, Yoshua Bengio. ICML 2015.
The original implementation in Theano
An earlier implementation in Tensorflow
Microsoft COCO dataset
Caffe to Tensorflow

Name		Name	Last commit message	Last commit date
Latest commit History 28 Commits
examples		examples
utils		utils
LICENSE.md		LICENSE.md
README.md		README.md
base_model.py		base_model.py
dataset.py		dataset.py
main.py		main.py
model.py		model.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Examples

References

About

Releases

Packages

Languages

License

arbdigital/image_captioning

Folders and files

Latest commit

History

Repository files navigation

Examples

References

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages