Skip to content

1805.06749

Yana edited this page Jul 10, 2019 · 1 revision

BMVC 2018

[Arxiv 1805.06749] Action Completion: A Temporal Model for Moment Detection [project page] [PdF] [notes]

Farnoosh Heidarivincheh, Majid Mirmehdi, Dima Damen

read 2019/07/03

Objective

Detect the completion moment for actions: when the action's goal is considered as achieved

Action recognition often doesn't focus on detecting whether an action's aim has been achieved.

Action completion is different end of action localization as it focuses on the goal of the action

Use a convolutional-recurrent neural network for the task of predicting completion

Classification voting

At each time step t, the sequence is split in two parts (up until time step t and starting time step t+1). The classification vote distinguishes the split containing the completion moment.

Regression voting

At t, predict the relative position of the completion moment

Synthesis

When combining contributions from frames prior to the completion moment, as well as frames post completion, the completion moment is detected with confidence

Datasets

Use 3 public datasets, including sports and daily actions

  • RGBD-AC (from their previous work)
    • annotations of some videos in HMDB and UCF 101

Method

Supervised problem with completion as binary classification

Momentary completion

Detect a specific frame at which we are confident that an action has been completed. Assumes labeling is consistent accross images.

Metrics

Evaluation metrics:

  • Accuracy (for every sequence, compute ratio of frames that are correctly labelled as pre or post-completion)
  • Relative distance error: normalized (by the length of the sequence) distance between predicted and ground truth completion moments

Experiments

Results

They correctly detect the completion moment within 1 second (30 frames) in 89% of all test sequences, and within 0.5 second in 74% of sequences.

Notes

Next reads

When will you do what? anticipating temporal occurrences of activities. Am I done? predicting action progress in videos. arXiv preprint arXiv:1705.01781, 2018.

Clone this wiki locally