Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Architecture of the aggregation in HeteroSAGEConv? #29

Open
anniekmyatt opened this issue Jul 24, 2021 · 1 comment
Open

Architecture of the aggregation in HeteroSAGEConv? #29

anniekmyatt opened this issue Jul 24, 2021 · 1 comment

Comments

@anniekmyatt
Copy link

anniekmyatt commented Jul 24, 2021

Hello!
Not really an issue but I have a question about the implementation of the update step in hetero_gnn.py. What is the benefit of calculating the output via these lines:

aggr_out = self.lin_neigh(aggr_out)
node_feature_self = self.lin_self(node_feature_self)
aggr_out = torch.cat([aggr_out, node_feature_self], dim=-1)
aggr_out = self.lin_update(aggr_out)

so applying a linear layer to the aggregated neighbour features and another linear layer to features of the node itself, and afterwards applying another layer to the concatenation of the results? In terms of the weights matrix multiplications this represents:

I thought it would be simpler to use just
aggr_out = torch.cat([aggr_out, node_feature_self], dim=-1)
aggr_out = self.lin_update(aggr_out)

where self.lin_update is now initialised as self.lin_update = nn.Linear(self.in_channels_self + self.in_channels_neigh, self.out_channels) and we don't need the linear layers self.lin_neigh and self.lin_self anymore?

This represents something like

where CONCAT is the vector concatenation operator and the prime indicates that we now have a different dimension for W_y and b_y.

In terms of the number of parameters in the model it doesn't make a huge difference but by including these additional layers, you have a more complex optimisation surface that involves a product of weights matrices. Would this not make it a bit harder for the gradient descent algorithm to get to a good solution?

Thank you for any explanation you can provide for the benefits of the slightly more complex architecture implemented in deepsnap!

@zechengz
Copy link
Collaborator

zechengz commented Jul 29, 2021

Hi,

The idea of using separate linear layers for self and neighbor is mainly derived from the Relational GCN, which is briefly described in P13 of this slides. And adding another layer at the end may play the role of post-process layer, which is introduced in P52 of this slides. Usually using a post-process layer can be helpful, as shown in this paper.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants