-
Notifications
You must be signed in to change notification settings - Fork 7
1603.07763.md
Yana edited this page May 29, 2020
·
1 revision
Seeing Invisible Poses: Estimating 3D Body Pose from Egocentric Video, CVPR'17 {paper} {project page} {code.gz} {dataset.zip}
Hao Jiang, Kristen Grauman
Go beyond previous work that reconstruct only visible first person poses (visible arms)r
Learn a prior of full body motion given environment visible cues
- Collected with both 3rd (Kinect) and 1st person (chest-mounted GoPro) views,
both provide RGB streams
- Using Kinect V2 sensor, capture ground truth human poses.
- 3D positions of 25 body joints defined in the MS Kinect SDK.
- chest-mounted camera
- 18 ground truth videos, 3 videos for training rest for testing
- 10 subjects, normal daily activities
- Handle pose estimation as per-frame classification task
- k-means with L2 norm to obtain K=300 pose clusters (where poses are all ground-truth poses in training set).
- dynamic features
- use optical flow to compute point correspondences, which is used to compute the homographye (underlying assumption that scene is planar? Only one homography for the full scene is computed afaiu. I am missing smthg here)
- use homographies between consecutive frames to estimate camera rotation, assuming rotation dominates over translation, and camera intrinsics are known
- static features
- collect a dataset of standing vs sitting, train classifier on standing vs sitting
- Additional temporal model on 1-3minute temporal sequences to produce
- constraints on transitioning from pose clusters given existing transitions in training set
- encourage consistency between predictions from static and motion features
runtime: 0.5s per frame
Report mean cm errors per for different joints
Compare to several 3rd person baselines on their dataset For upper-body joints results slightly better then always-standing baseline (that predicts fixed standing pose), clearer improvement on lower joints