-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Different results for e1 and e2 #4
Comments
We very much welcome your reproduction attempt! I start by looking at your
The sentece IoU is bang on 79% but the sign IoU is higher than expected (67% compared to 63% reported in the paper) Looking at
You get 63% on sentences, which we never do. We always get more (except for E5), and 56% on sign, which is lower than our 66%. Your Now that all the facts are there, some things to check:
Finally, I share with you the code for our table. If you fill in your numbers we might be able to spot some stark/consistent difference.
|
We reran E1 and E2 and created the following table which contains our values along with the deltas to the reported values:. E1 trained over 20 epochs, E2 trained over 42 epochs, E3 trained over 80 epochs, and E4 trained over 69 epochs. \resizebox{\linewidth}{!}{ \textbf{E0} & \textbf{\citet{10.1007/978-3-030-66096-3_17}} & --- & \bottomrule Our E1 values seem closer to the reported values than the first time we ran it, however e2 is still quite different. Our time values are also quite off, but this can be explained by the use of disklist instead of memory. We doublechecked our code and as far as we can tell nothing was modified apart from the addition of disklist. Our IoU thresholds should be 50 and 50. |
So we are comparing our results I think I am personally more surprised by your higher IoU for E4 than by the lower scores. both are suspicious. In any case, after comparing the commands (E1, E2, E3, and E4), I find that E2 has one thing that is unique: python -m sign_language_segmentation.src.train --dataset=dgs_corpus --pose=holistic --fps=25 --hidden_dim=256 --encoder_depth=1 --encoder_bidirectional=true --pose_components POSE_LANDMARKS LEFT_HAND_LANDMARKS RIGHT_HAND_LANDMARKS FACE_LANDMARKS --pose_reduce_face=true Specifically, the {
"pose_components": ["POSE_LANDMARKS", "LEFT_HAND_LANDMARKS", "RIGHT_HAND_LANDMARKS", "FACE_LANDMARKS"],
"pose_reduce_face": True
} Tagging @J22Melody to see if there is something else to check. |
Here are the arguments being printed for each experiment. I've also attached the complete log files. It looks like these arguments are set correctly for e2. E0:
E1:
E2:
E3:
E4:
|
We ran the experiments reported in the paper three times with seeds You can see in Table 4, that the I will check your logs in detail and compare them to ours to see whether I can spot something that might lead to the different results in |
We were able to get e0 - e4 running on our hardware setup. To get around the memory issue we stored the dataset in disk list, a drop in replacement for the standard python list which stores the data on disk instead of memory. We are running the training processes with the commands supplied on GitHub. The results we got are in the attached files.
We noticed for e1 and e2 that our scores are lower than what was reported in the paper. We were wondering if our results are around what you would expect, or if there could be an issue with our setup. Our end goal is to reproduce the results in the paper, so any advice on how we could modify the code to be more inline with the experiments in the paper would be much appreciated.
e1.txt
e4.txt
e3.txt
e2.txt
e0.txt
The text was updated successfully, but these errors were encountered: