Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implementation request: Attention GRU #387

Open
juesato opened this issue Feb 2, 2017 · 4 comments
Open

Implementation request: Attention GRU #387

juesato opened this issue Feb 2, 2017 · 4 comments

Comments

@juesato
Copy link
Contributor

juesato commented Feb 2, 2017

@nicholas-leonard I'm probably going to write a GRU with attention. I'm curious to get your input on the best way to do this. I'm also happy to contribute it here if you want.

The first option is to modify the current GRU implementation, where every time step takes 3 inputs rather than 2, {x_t, h_t-1, enc} where enc are the encodings being attended over.

I'm not particularly satisfied with this since there's a lot of almost duplicated code in handling the forwards / backwards passes. I don't think this is too bad, since this is already done across GRU / LSTM. It would also be slower, and wouldn't be possible to do the trick where the multiplication of the encoding states to the embedding hidden state is only done once. I'm not sure how important this is, since it seems like anything actually really speed critical needs to be written at a lower level anyways.

The other option would be to write it as a single module, like SeqGRUAttention. Seems to involve a lot of code redundancy for similar reasons. But this way, it wouldn't have to worry about playing nicely with Sequencer / repeating the boilerplate in GRU.lua.
I think the major disadvantage of this approach is that it's less transparent what's going on, since the gradients are computed by hand.

I'm slightly leaning towards the second.

@gaosh
Copy link

gaosh commented Feb 8, 2017

I implemented the temporal attention model with LSTM from Describing Videos by Exploiting Temporal Structure. I made it like SeqLSTMAttention, you can also take a look at this post.

@juesato
Copy link
Contributor Author

juesato commented Feb 12, 2017

This doesn't seem the same, unless I'm missing something. I want attention to be integrated into the internal dynamics of the GRU, and this module takes an encoding and a hidden state and gives you weights, but it would still need to be integrated with a GRU. If there's a clean way to do so, I'd be interested.

@gaosh
Copy link

gaosh commented Feb 12, 2017

Can you show me which paper you want to implement exactly?

@juesato
Copy link
Contributor Author

juesato commented Feb 13, 2017

Sure, either Bahdanau 2014 or Luong 2015 would do.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants