For a trained model to load into the decoder, use
BLEU scores for VGG19 (Orange) and ResNet152 (Red) Trained With Teacher Forcing.
BLEU Score | Graph | Top-K Accuracy | Graph |
---|---|---|---|
BLEU-1 | Training Top-1 | ||
BLEU-2 | Training Top-5 | ||
BLEU-3 | Validation Top-1 | ||
BLEU-4 | Validation Top-5 |
This was written in python3 so may not work for python2. Download the COCO dataset training and validation
images. Put them in data/coco/imgs/train2014
and data/coco/imgs/val2014
respectively. Put the COCO
dataset split JSON file from Deep Visual-Semantic Alignments
in data/coco/
. It should be named dataset.json
.
Run the preprocessing to create the needed JSON files:
python generate_json_data.py
Start the training by running:
python train.py
The models will be saved in model/
and the training statistics will be saved in runs/
. To see the
training statistics, use:
tensorboard --logdir runs
python generate_caption.py --img-path <PATH_TO_IMG> --model <PATH_TO_MODEL_PARAMETERS>
- Create image encoder class
- Create decoder class
- Create dataset loader
- Write main function for training and validation
- Implement attention model
- Implement decoder feed forward function
- Write training function
- Write validation function
- Add BLEU evaluation
- Update code to use GPU only when available, otherwise use CPU
- Add performance statistics
- Allow encoder to use resnet-152 and densenet-161
Original Theano Implementation
Neural Machine Translation By Jointly Learning to Align And Translate