Skip to content

Latest commit

 

History

History
22 lines (15 loc) · 1.24 KB

ShowAndTell.md

File metadata and controls

22 lines (15 loc) · 1.24 KB

Show and Tell: A Neural Image Caption Generator

Task

Image Captioning: Generate natural sentences describing an image.

Network Design

  • CNN: GoogleNet, use the last hidden layer to represent an image and input to the LSTM to generate sentence. Image features only shown at the first time step of LSTM, otherwise, the model will be more prone to overfit.
  • LSTM: Used to generate word based on image features and previously generated words:

Objective

Sum of the negative log likelihood of the correct word at each step:

Training Details

  • Use pre-trained GoogleNet to initialize CNN and fix its weights during training.
  • Change parameters of embedding layer and LSTM during training.

Reference

Vinyals, Oriol, et al. "Show and tell: A neural image caption generator." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2015.