Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Good work, enjoy reading it. And some questions about the deatils in implementation #3

Open
VPeterV opened this issue Apr 23, 2022 · 2 comments

Comments

@VPeterV
Copy link

VPeterV commented Apr 23, 2022

Hi! I really like this work. The paper is very precise and readable. But I am still curious about some details about computing potential functions.

  1. To my understanding, if the model learns well, sum_s psi_{st}(y_s,y_t) will be equal to psi_s(y_s) the model learns. I notice that in this implementation, when computing edge's potential function, the denominator is computed by sum_s = torch.sum(logits, dim=2).unsqueeze(2) + eps, sum_t = torch.sum(logits, dim=1).unsqueeze(1) + eps instead of by using pred_node. So here I am curious that have you tested using pred_node instead? If yes, will the performance be sensitive to this?
  2. And I notice here the aforementioned denominator has been scaled by norm_coef. Since I find the denominator will sometimes be a very small value in log-space. I wonder whether the model is sensitive to this hyper-parameter? If yes, do you think it is caused by some numerical stability issues, or just by the model's ability to learn this probability since sometimes the graph is sparse?
    thks in advance :)
@mnqu
Copy link
Collaborator

mnqu commented Apr 25, 2022

Thanks for your interests!

  1. For psi_s(y_s), we actually tried both options, i.e., (1) directly using pred_node or (2) using sum_s psi_{st}(y_s,y_t). These two options yielded close results, and we used option (2) in the model.

  2. You are right that the denominator will sometimes be very small in log-space. This is because sum_s and sum_t in the denominator tend to be a one-hot vector (i.e., one dimension close to 1 and others close 0), and hence we would obtain very small values after taking logarithm. These small values might cause numerical stability issues. To address the issues, we tried a few options, i.e., (1) adding a hyperparameter norm_coef as what we did in the current codes, (2) using a larger eps to make sum_s and sum_t smoother, (3) adding an annealing temperature to make sum_s and sum_t smoother. These options also yielded similar results and we picked up option (1) because of its simplicity. In this case, the results are quite sensitive to norm_coef.

Thank you again for the interest, and let me know if there is any further question.

@VPeterV
Copy link
Author

VPeterV commented Apr 26, 2022

wow, a comprehensive and detailed answer. It is very helpful. thanks a lot!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants