Skip to content

Tensorflow implementation of "Show, Attend and Tell: Neural Image Caption Generation with Visual Attention"

License

Notifications You must be signed in to change notification settings

arbdigital/image_captioning

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

28 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

This is a neural network architecture for image captioning roughly based on the paper "Show, Attend and Tell: Neural Image Caption Generation with Visual Attention" by Xu et al. (ICML2015). The input is an image, and the output is a sentence describing the content of the image. It first uses a convolutional neural network to extract a feature vector of the input image, and then uses a LSTM recurrent neural network to decode this feature vector into a natural language sentence. A soft attention mechanism is incorporated to improve the quality of the caption.

This project is implemented in Tensorflow, and allows end-to-end training of both CNN and RNN parts. To use it, you will need the Tensorflow version of VGG16 or ResNet(50, 101, 152) model, which can be obtained by using Caffe-to-Tensorflow.

Examples

image image image image image image image image image

References

About

Tensorflow implementation of "Show, Attend and Tell: Neural Image Caption Generation with Visual Attention"

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%