Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Weird results when applying SemGCN to 2D pose from image #32

Open
duckduck-sys opened this issue Oct 28, 2020 · 7 comments
Open

Weird results when applying SemGCN to 2D pose from image #32

duckduck-sys opened this issue Oct 28, 2020 · 7 comments

Comments

@duckduck-sys
Copy link

Inference on images in the wild using SemGCN has been partially covered in this thread and others, but only the overall process has been made clear. I.e.:

  • Step 1: Use a 2D pose estimation network to generate 2D pose in MPII format.
  • Step 2: Convert 2D pose from MPII format to H36M format as done here.
  • Step 3: Pre-process the 2D input pose as done here.
  • Step 4: Use the pre-processed 2D pose in H36M format as input to the SemGCN SH model. It outputs 3D pose in H36M format.

Below i will follow each step, using the test image of size 300x600 to the left.

original_image 2d_put

For Step 1, i use EfficientPose to generate the MPII format 2d pose of the test image as shown above on the right, here's the numeric output:

positions = [[[108. 512.]	# Right ankle
              [114. 428.]	# Right knee
              [124. 320.]	# Right hip
              [186. 324.]	# Left hip
              [178. 426.]	# Left knee
              [176. 512.]	# Left ankle
              [156. 322.]	# Pelvis
              [162. 152.]	# Thorax
              [164. 114.]	# Upper neck
              [166.  24.]	# Head top
              [ 60. 322.]	# Right wrist
              [ 78. 238.]	# Right elbow
              [ 96. 148.]	# Right shoulder
              [230. 154.]	# Left shoulder
              [240. 246.]	# Left elbow
              [224. 326.]]]	# Left wrist

For Step 2, i run this:

positions = positions[:, SH_TO_GT_PERM, :]

To get the output:

positions = [[[156. 322.]
              [124. 320.]
              [114. 428.]
              [108. 512.]
              [186. 324.]
              [178. 426.]
              [176. 512.]
              [162. 152.]
              [164. 114.]
              [166.  24.]
              [230. 154.]
              [240. 246.]
              [224. 326.]
              [ 96. 148.]
              [ 78. 238.]
              [ 60. 322.]]]

For Step 3, i run this:

positions[..., :2] = normalize_screen_coordinates(positions[..., :2], w=300, h=600)

To get the output:

positions = [[[ 0.0399  0.1466 ]
              [-0.1733  0.1333 ]
              [-0.2400  0.8533 ]
              [-0.2799  1.4133 ]
              [ 0.2400  0.1600 ]
              [ 0.1866  0.8399 ]
              [ 0.1733  1.4133 ]
              [ 0.0800 -0.9866 ]
              [ 0.0933 -1.2400 ]
              [ 0.1066 -1.8400 ]
              [ 0.5333 -0.9733 ]
              [ 0.6000 -0.3600 ]
              [ 0.4933  0.1733 ]
              [-0.3600 -1.0133 ]
              [-0.4800 -0.4133 ]
              [-0.6000  0.1466 ]]]

For Step 4, the above is used as input to the SemGCN SH model running this:

inputs_2d = torch.from_numpy(positions)
inputs_2d = inputs_2d.to(device)
outputs_3d = model_pos(inputs_2d).cpu()
outputs = outputs_3d[:, :, :] - outputs_3d[:, :1, :]

Which gives the output:

outputs = [[[ 0.0000  0.0000  0.0000 ]
            [-0.0769 -0.6899 -0.2520 ]
            [ 0.0847 -0.4062 -0.0607 ]
            [ 0.4154  0.2318  0.4062 ]
            [ 0.2708 -0.5181 -0.0504 ]
            [ 0.3431 -0.7337  0.3018 ]
            [ 0.6379  0.6684  0.2033 ]
            [ 0.1650 -0.9141 -0.8496 ]
            [ 0.5825 -2.1341  0.2762 ]
            [ 1.1561 -1.5364 -0.6433 ]
            [ 1.1612 -1.1453 -0.2103 ]
            [ 0.9097 -0.6763  0.2361 ]
            [ 0.8202 -0.2971  0.2679 ]
            [ 0.8008 -1.1936 -0.1120 ]
            [ 0.2124 -1.3246  0.5563 ]
            [ 0.5093 -0.4762  0.3473 ]]]

When visualized this looks completely wrong... See image below. Can anyone highlight on where the problem lies? Is it a problem with the pre-processing, or with the model?

3d_output

@develduan
Copy link

@duckduck-sys I think there are two points to note about the data:

  1. location: the neck should be halfway between the shoulders, and the thorax should be roughly halfway between the neck and the hips.
  2. scale: normalization depends on the image width(mapping to [-1,1] based on the w), thus, the proportion of human body in the image needs to match H36M.
  • raw output
    pose_lifting_output271_unscale_raw
  • after scaling the locations
    pose_lifting_output271_raw
  • after modifying the location of the neck and the thorax
    pose_lifting_output271_modified

@dandingol03
Copy link

@duckduck-sys I think there are two points to note about the data:

  1. location: the neck should be halfway between the shoulders, and the thorax should be roughly halfway between the neck and the hips.
  2. scale: normalization depends on the image width(mapping to [-1,1] based on the w), thus, the proportion of human body in the image needs to match H36M.
  • raw output
    pose_lifting_output271_unscale_raw
  • after scaling the locations
    pose_lifting_output271_raw
  • after modifying the location of the neck and the thorax
    pose_lifting_output271_modified

hi, how do you calculate the spine point

@dandingol03
Copy link

hi @duckduck-sys
大神,你的数据归一化后有点奇怪,还有我想请教下您现在能正确回归出3d pose了吗,主要是hip和spine节点我不懂计算,还有就是hip节点要假定为(0,0)吗

@lisa676
Copy link

lisa676 commented Dec 17, 2020

@develduan Hi Duan, can you share this solution? I'm also facing somewhat same problem.
Thanks

@dandingol03
Copy link

@develduan Hi. I also face the same problem about how to figure out the spine point, because the stack-hourglass doesn't output the spine point

@develduan
Copy link

@lisa676 @dandingol03 Hi, I'm sorry that I stopped following this project because it didn't work very well on my dataset(the wild environment). In my dataset, all pedestrians stand upright, so I simply treated the midpoint of the neck and the pelvis as the thorax/spine: positions_mpii[i_thorax] = (positions_mpii[i_neck] + positions_mpii[i_pelvis]) / 2 . After normalize_screen_coordinates, scale the locations with a factor to fit the proportion of human body in the image of H36M, in my case: positions = positions / 2.

In my case, I want to get the 3D posture directly from the image instead of getting a 2D posture and then a 3D posture, and I got a better result by following this paper "Angjoo Kanazawa, Michael J. Black, David W. Jacobs, Jitendra Malik. End-to-end Recovery of Human Shape and Pose".

@dandingol03
Copy link

@develduan Firstly, thanks for your kindly apply. Secondly, the paper " End-to-end Recovery of Human Shape and Pose" is cool, i will delve into the paper soon. And last, here is my email [email protected], maybe someday we can exchange idea about 3d pose estimation~

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants