Architecture of the aggregation in HeteroSAGEConv? #29

anniekmyatt · 2021-07-24T22:33:13Z

Hello!
Not really an issue but I have a question about the implementation of the update step in hetero_gnn.py. What is the benefit of calculating the output via these lines:

aggr_out = self.lin_neigh(aggr_out)
node_feature_self = self.lin_self(node_feature_self)
aggr_out = torch.cat([aggr_out, node_feature_self], dim=-1)
aggr_out = self.lin_update(aggr_out)

so applying a linear layer to the aggregated neighbour features and another linear layer to features of the node itself, and afterwards applying another layer to the concatenation of the results? In terms of the weights matrix multiplications this represents:

$W_{y} \begin{bmatrix} W_{u}x_{u}+b_{u} \\W_{\nu}x_{\nu}+b_{\nu} \end{bmatrix} + b_{y}$

I thought it would be simpler to use just

aggr_out = torch.cat([aggr_out, node_feature_self], dim=-1)
aggr_out = self.lin_update(aggr_out)

where self.lin_update is now initialised as self.lin_update = nn.Linear(self.in_channels_self + self.in_channels_neigh, self.out_channels) and we don't need the linear layers self.lin_neigh and self.lin_self anymore?

This represents something like

$W_{y}\prime CONCAT(x_{u}, x_\nu) +b_{y}\prime,$

where CONCAT is the vector concatenation operator and the prime indicates that we now have a different dimension for W_y and b_y.

In terms of the number of parameters in the model it doesn't make a huge difference but by including these additional layers, you have a more complex optimisation surface that involves a product of weights matrices. Would this not make it a bit harder for the gradient descent algorithm to get to a good solution?

Thank you for any explanation you can provide for the benefits of the slightly more complex architecture implemented in deepsnap!

The text was updated successfully, but these errors were encountered:

zechengz · 2021-07-29T05:03:26Z

Hi,

The idea of using separate linear layers for self and neighbor is mainly derived from the Relational GCN, which is briefly described in P13 of this slides. And adding another layer at the end may play the role of post-process layer, which is introduced in P52 of this slides. Usually using a post-process layer can be helpful, as shown in this paper.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Architecture of the aggregation in HeteroSAGEConv? #29

Architecture of the aggregation in HeteroSAGEConv? #29

anniekmyatt commented Jul 24, 2021 •

edited

Loading

zechengz commented Jul 29, 2021 •

edited

Loading

Architecture of the aggregation in HeteroSAGEConv? #29

Architecture of the aggregation in HeteroSAGEConv? #29

Comments

anniekmyatt commented Jul 24, 2021 • edited Loading

zechengz commented Jul 29, 2021 • edited Loading

anniekmyatt commented Jul 24, 2021 •

edited

Loading

zechengz commented Jul 29, 2021 •

edited

Loading