Good work, enjoy reading it. And some questions about the deatils in implementation #3

VPeterV · 2022-04-23T16:32:12Z

Hi! I really like this work. The paper is very precise and readable. But I am still curious about some details about computing potential functions.

To my understanding, if the model learns well, sum_s psi_{st}(y_s,y_t) will be equal to psi_s(y_s) the model learns. I notice that in this implementation, when computing edge's potential function, the denominator is computed by sum_s = torch.sum(logits, dim=2).unsqueeze(2) + eps, sum_t = torch.sum(logits, dim=1).unsqueeze(1) + eps instead of by using pred_node. So here I am curious that have you tested using pred_node instead? If yes, will the performance be sensitive to this?
And I notice here the aforementioned denominator has been scaled by norm_coef. Since I find the denominator will sometimes be a very small value in log-space. I wonder whether the model is sensitive to this hyper-parameter? If yes, do you think it is caused by some numerical stability issues, or just by the model's ability to learn this probability since sometimes the graph is sparse?
thks in advance :)

The text was updated successfully, but these errors were encountered:

mnqu · 2022-04-25T15:15:04Z

Thanks for your interests!

For psi_s(y_s), we actually tried both options, i.e., (1) directly using pred_node or (2) using sum_s psi_{st}(y_s,y_t). These two options yielded close results, and we used option (2) in the model.
You are right that the denominator will sometimes be very small in log-space. This is because sum_s and sum_t in the denominator tend to be a one-hot vector (i.e., one dimension close to 1 and others close 0), and hence we would obtain very small values after taking logarithm. These small values might cause numerical stability issues. To address the issues, we tried a few options, i.e., (1) adding a hyperparameter norm_coef as what we did in the current codes, (2) using a larger eps to make sum_s and sum_t smoother, (3) adding an annealing temperature to make sum_s and sum_t smoother. These options also yielded similar results and we picked up option (1) because of its simplicity. In this case, the results are quite sensitive to norm_coef.

Thank you again for the interest, and let me know if there is any further question.

VPeterV · 2022-04-26T11:25:49Z

wow, a comprehensive and detailed answer. It is very helpful. thanks a lot!

Provide feedback