Show and Tell: A Neural Image Caption Generator

Task

Image Captioning: Generate natural sentences describing an image.

Network Design

CNN: GoogleNet, use the last hidden layer to represent an image and input to the LSTM to generate sentence. Image features only shown at the first time step of LSTM, otherwise, the model will be more prone to overfit.
LSTM: Used to generate word based on image features and previously generated words:

Objective

Sum of the negative log likelihood of the correct word at each step:

Training Details

Use pre-trained GoogleNet to initialize CNN and fix its weights during training.
Change parameters of embedding layer and LSTM during training.

Reference

Vinyals, Oriol, et al. "Show and tell: A neural image caption generator." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2015.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ShowAndTell.md

ShowAndTell.md

Show and Tell: A Neural Image Caption Generator

Task

Network Design

Objective

Training Details

Reference

Files

ShowAndTell.md

Latest commit

History

ShowAndTell.md

File metadata and controls

Show and Tell: A Neural Image Caption Generator

Task

Network Design

Objective

Training Details

Reference