👀 What?

This repository contains code for using GROOViST: A Metric for Grounding Objects in Visual Storytelling—In proceedings of EMNLP 2023.

🤔 Why?

Evaluating the degree to which textual stories are grounded in the corresponding image sequences is essential for the Visual Storytelling task. We propose GROOViST, based on insights obtained from existing open-source metrics (CLIPScore, RoViST-VG). Our analyses shows that GROOViST effectively measures the extent to which a story is grounded in an image sequence.

🤖 How?

Currently, GROOViST can be used off-the-shelf for evaluating <image-sequence, story> pairs of three Visual Storytelling datasets — VIST, AESOP, VWP. For a new/custom dataset, all the following steps can be adapted accordingly.

Setup

Install python (e.g., 3.11) and other dependencies provided in requirements.txt. E.g., using:

pip install -r requirements.txt

Step 0: Extract image regions

For the sequence(s) of interest, GROOViST requires B image regions per image in the sequence(s) (e.g., B=10). Please refer to this doc for preparing them.

Step 1: Extract noun phrases

For the sequence(s) of interest, GROOViST works with the noun phrases in the stories. Use the following command for extracting noun phrases from stories:

python extract_nphrases.py --input_file data/sample_stories.json --output_file data/sample_nphrases.json

Step 2: Compute GROOViST scores

python groovist.py --dataset VIST --input_file data/sample_nphrases.json --output_file data/sample_scores.json

🔗 If you find this work useful, please consider citing it:

@inproceedings{surikuchi-etal-2023-groovist,
    title = "{GROOV}i{ST}: A Metric for Grounding Objects in Visual Storytelling",
    author = "Surikuchi, Aditya  and Pezzelle, Sandro  and Fern{\'a}ndez, Raquel",
    editor = "Bouamor, Houda  and Pino, Juan  and Bali, Kalika",
    booktitle = "Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing",
    month = dec,
    year = "2023",
    address = "Singapore",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2023.emnlp-main.202",
    pages = "3331--3339"
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

👀 What?

🤔 Why?

🤖 How?

Setup

Step 0: Extract image regions

Step 1: Extract noun phrases

Step 2: Compute GROOViST scores

Files

README.md

Latest commit

History

README.md

File metadata and controls

👀 What?

🤔 Why?

🤖 How?

Setup

Step 0: Extract image regions

Step 1: Extract noun phrases

Step 2: Compute GROOViST scores