1903.05690.md

Putting Humans in a Scene: Learning Affordance in 3D Indoor Environments, CVPR'19 {paper} {notes}

Xueting Li, Sifei Liu, Kihwan Kim, Xiaolong Wang, Ming-Hsuan Yang, and Jan Kautz

Objective

Generate valid 3D human stick poses given scene constraints

Datasets

Use the Sitcom dataset, which contains pose samples captured from sitcom videos to train a human pose prediction model. Adapt the trained model onto the SUNCG images to generate poses that follow natural human behaviors.

Method

model the distributions of pose locations and gestures by two conditional VAEs
scene representation : image with ResNet18 encoding
Predict location in scene and then pose as second step
adverserial loss on predicted locations

Experiments

Evaluation
- train a pose authenticity classifier by automatically generating synthetic poses and manually annotating implausible ones, which has a 86% accuracy, and use this to evaluate their model, ** authenticity classifier**: ratio of poses that are classified as positive
- geometry score: free space constraint and support constraint (affordable surface within 8 voxel units of the pose.)
- user study : Given pair of poses sampled from ground truth poses and generated poses a user is asked to select the pose that is more reasonable in an indoor environment. (Report almost 50%, which is pretty high ! Evaluated only on 2D projections)