Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

on Screen gaze #19

Open
younesmch opened this issue Jan 23, 2022 · 15 comments
Open

on Screen gaze #19

younesmch opened this issue Jan 23, 2022 · 15 comments

Comments

@younesmch
Copy link

hi thnks for great working again
i just wanna get the on screen gaze point
i calculate point of intersection between screen plane and gaze segment(as the source code written in https://git.hcics.simtech.uni-stuttgart.de/public-projects/opengaze/-/wikis/API-calls)
i transfer the point of intersection to screen coordinate system but i get wrong result

can some one help in which coordinate system gaze vector ,eye point
if someone done help thnks

@hysts
Copy link
Owner

hysts commented Jan 26, 2022

Hi, @younesmch

I think both the gaze vector and head pose are in camera coordinate system. So, I think we can get the gaze point on the screen with the following:

point_on_screen = face.center - face.gaze_vector * face.center[2] / face.gaze_vector[2]

As I moved my head position in front of a camera while staring at a fixed point, the computed gaze points on the screen seemed to be consistent to some extent, though they were not very accurate and there seemed to be a problem with the y-coordinate being lower than expected.
I'm not sure what causes the problem with the y-coordinate. Maybe I've misunderstood something about the training data and there's a bug.

@younesmch
Copy link
Author

younesmch commented Jan 26, 2022

Hi , @hysts
i calculate the gaze point which is the intersection between screen plan and gaze segment
the gaze point i let it in camera coordinated system then i draw scatter plot for each X of gaze point in CCS and Y of gaze point in CCS
my screen
[](url
Sans titre
)

what i get on plot (y data on red color) and units in centimetre on CCS
Capture

for X data the values it is acceptable between(-15,15) cm on CCS
but Y data it is between (5,18) cm in fact the y data must be betwenn (0,20)

@tomilles
Copy link

tomilles commented Jan 26, 2022

@hysts I tried the line of code you mentioned, but when my head position in 3d is fixed and i only change the orientation of my head while looking at the same point on my screen, the point_on_screen coordinates are changing depending on the orientation of my head.

So in the line
point_on_screen = face.center - face.gaze_vector * face.center[2] / face.gaze_vector[2]
the face.center can remain the same while if I change the orientation of my head that would produce a different face.gaze_vector and hence a different point on screen even though my focus point remained the same, but if i change from left-looking head orientation to right-looking orientation while gazing at the same point, the X coordinate of the gaze vector changes accordingly, as expected, however this results in a different point_on_screen coordinate.

What is not clear to me is relative to what are the pitch and yaw computed? when are they 0? Do we have to do some kind of transformation to it opposite to the head rotation relative to the camera?

I can send you a video/example to showcase the problem if needed.

@younesmch
Copy link
Author

@tomilles the gaze vector is calculated on screen coordinated system and the units in métre
the transformation it's depend on the location of the camera on screen for my case i do just translation

@tomilles
Copy link

@younesmch so how do you get consistent/unchanged point_on_screen coordinates while gazing at the same point on screen but moving your head around? I tried many ways but I cannot seem to figure it out. Could you walk me through your steps or show me?

@younesmch
Copy link
Author

@tomilles for me i think the head pos it is injected during the training data as mentioned in the paper so the model can predict the correct point independently to head pos
for me i just calculate the intersection between gaze segment and screen plan as i mention but the result not significant specially on y of gaze point

@hysts
Copy link
Owner

hysts commented Jan 27, 2022

@younesmch @tomilles

I experimented to see how the predicted gaze point on the screen shifts depending on the head pose, and the following are the results:

I took 200 frames of videos with my head pose fixed and looking in the direction of the camera, and plotted the results. The plots in the first row are for moving the head position in the XYZ direction, and the plots in the second row are for rotating the head pose in the pitch, yaw, and roll directions. The axes of the graphs are flipped for visualization, and the units are centimeters here. Also, the distance from the camera is basically about 50cm, with "near" being about 25cm and "far" being about 100cm.

It seems that the predicted gaze vectors are off by about 20 degrees in the pitch direction, and that the predicted gaze point on the screen shifts into the X direction when the head rotates in the yaw direction.

I think something is wrong, but can't figure out what it is. This may take some time. Please let me know if you have any ideas.

@younesmch
Copy link
Author

younesmch commented Jan 27, 2022

@hysts i think the problem is with datasets which cover a limited area of gaze which make the model can't predict out of this area
image
image

Distributions of head angle (h) and gaze angle (g)
in degrees for MPIIGaze,

@hysts
Copy link
Owner

hysts commented Jan 27, 2022

@younesmch

I forgot to mention, but I used a model pretrained with the ETH-XGaze dataset, which covers much wider range of gaze and head direction, in the above experiment. The distribution bias in the dataset could be the cause, but I'm not sure at this point.

@younesmch
Copy link
Author

@hysts i can't see the problem the model predict the vertical distance totally non significant and in little range of gaze

@ffletcherr
Copy link

@hysts I recreated pitch,yaw labels for MPIIFaceGaze dataset using MediaPipe based head pose estimator and your great developed tools and this script which I borrowed from official eth-xgaze github:

estimator.face_model_3d.estimate_head_pose(face, estimator.camera)
estimator.face_model_3d.compute_3d_pose(face)
estimator.face_model_3d.compute_face_eye_centers(face, 'ETH-XGaze')
estimator.head_pose_normalizer.normalize(im, face)
hR = face.head_pose_rot.as_matrix()
euler = face.head_pose_rot.as_euler('XYZ')
hRx = hR[:, 0]
forward = (face.center / face.distance).reshape(3)
down = np.cross(forward, hRx)
down /= np.linalg.norm(down)
right = np.cross(down, forward)
right /= np.linalg.norm(right)
R = np.c_[right, down, forward].T  # rotation matrix R
gaze_point = np.array(line[24:27])
face_center = np.array(line[21:24])
gc = gaze_point - face.center*1000 #face_center
gc_normalized = np.dot(R, gc)
gc_normalized = gc_normalized / np.linalg.norm(gc_normalized)
gaze_theta = np.arcsin((-1) * gc_normalized[1])
gaze_phi = np.arctan2((-1) * gc_normalized[0], (-1) * gc_normalized[2])
gaze_norm_2d = np.asarray([gaze_theta, gaze_phi])

then finetuned your eth-xgaze_resnet18.pth using this new labels and the shift value is decreased significantly. I uploaded this new model here (finetuned_eth-xgaze_resnet18.pth) and you can test this. So I think this problem came from the wrong normalization process (in label creation) or the wrong head pose estimation in original dataset.

@hysts
Copy link
Owner

hysts commented Feb 9, 2022

Hi, @ffletcherr

Oh, that's wonderful! Thank you very much for the information.
Sorry for not updating anything on this issue. I've been busy recently and haven't had time to do OSS thing.

I had thought that the discrepancy in the pitch direction could be due to differences in the 3D models, but I hadn't checked it myself. But looking at your results, it seems more likely that it was indeed the case.

By the way, it's just a small detail, but I'm not sure if the original normalization was "wrong". I think it's simply a difference in the 3D models used. I mean, the process of head pose estimation is like rotating a rigid face mask in 3D space to get the best fit based on facial landmarks, but if a different mask is used, the best fit pose could be different.

Anyway, I will check differences in the 3D models and the model you trained soon. And thank you again, it's really helpful in narrowing down the problem.

@patrickshas
Copy link

@hysts I recreated pitch,yaw labels for MPIIFaceGaze dataset using MediaPipe based head pose estimator and your great developed tools and this script which I borrowed from official eth-xgaze github:

estimator.face_model_3d.estimate_head_pose(face, estimator.camera)
estimator.face_model_3d.compute_3d_pose(face)
estimator.face_model_3d.compute_face_eye_centers(face, 'ETH-XGaze')
estimator.head_pose_normalizer.normalize(im, face)
hR = face.head_pose_rot.as_matrix()
euler = face.head_pose_rot.as_euler('XYZ')
hRx = hR[:, 0]
forward = (face.center / face.distance).reshape(3)
down = np.cross(forward, hRx)
down /= np.linalg.norm(down)
right = np.cross(down, forward)
right /= np.linalg.norm(right)
R = np.c_[right, down, forward].T  # rotation matrix R
gaze_point = np.array(line[24:27])
face_center = np.array(line[21:24])
gc = gaze_point - face.center*1000 #face_center
gc_normalized = np.dot(R, gc)
gc_normalized = gc_normalized / np.linalg.norm(gc_normalized)
gaze_theta = np.arcsin((-1) * gc_normalized[1])
gaze_phi = np.arctan2((-1) * gc_normalized[0], (-1) * gc_normalized[2])
gaze_norm_2d = np.asarray([gaze_theta, gaze_phi])

then finetuned your eth-xgaze_resnet18.pth using this new labels and the shift value is decreased significantly. I uploaded this new model here (finetuned_eth-xgaze_resnet18.pth) and you can test this. So I think this problem came from the wrong normalization process (in label creation) or the wrong head pose estimation in original dataset.

@patrickshas
Copy link

@cancan101
Copy link

Any updates on code to resolve the on screen gaze location?

I would even be open to starting with something potentially even simpler, such as just determining if the face in the video "keeps their eyes on the camera" throughout the video.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants