Skip to content

1903.05690.md

Yana edited this page May 29, 2020 · 1 revision
Putting Humans in a Scene: Learning Affordance in 3D Indoor Environments, CVPR'19 {paper} {notes}

Xueting Li, Sifei Liu, Kihwan Kim, Xiaolong Wang, Ming-Hsuan Yang, and Jan Kautz

Objective

Generate valid 3D human stick poses given scene constraints

Datasets

Use the Sitcom dataset, which contains pose samples captured from sitcom videos to train a human pose prediction model. Adapt the trained model onto the SUNCG images to generate poses that follow natural human behaviors.

Method
  • model the distributions of pose locations and gestures by two conditional VAEs
  • scene representation : image with ResNet18 encoding
  • Predict location in scene and then pose as second step
  • adverserial loss on predicted locations
Experiments
  • Evaluation
    • train a pose authenticity classifier by automatically generating synthetic poses and manually annotating implausible ones, which has a 86% accuracy, and use this to evaluate their model, ** authenticity classifier**: ratio of poses that are classified as positive
    • geometry score: free space constraint and support constraint (affordable surface within 8 voxel units of the pose.)
    • user study : Given pair of poses sampled from ground truth poses and generated poses a user is asked to select the pose that is more reasonable in an indoor environment. (Report almost 50%, which is pretty high ! Evaluated only on 2D projections)
Clone this wiki locally