-
Notifications
You must be signed in to change notification settings - Fork 280
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GRU implementation #25
Comments
For the GRU implementation, in addition to modifying make_lstm, it will require some tinkering in the training code, as GRU doesn't have a cell state in addition to the hidden state. (so rnn_state_enc is a table with 2*opt.num_layers with LSTM but it should just be opt.num_layers with GRUs). It would be great to have GRU--let us know if/when you've implemented it! Also, apparently GRU is a little harder to optimize with vanilla SGD so you might want to use adagrad/adam/adadelta (we provide adagrad as an option currently). Yeah, we just simply add the hidden states, since doing the concatenation would require some fiddling with initializing the decoder with the last step of the encoder (since encoder would have twice the number of dimensions). I guess you could use do a linear layer to get back to the original dimensions, which would be slightly annoying. Hope this helps! |
Hi! Thanks to your help, i'm currently working on GRU implementation. |
I also am very interested in the GRU implementation! BTW, could @yoonkim elaborate on the part where GRU is harder to optimize using SGD? I don't quite understand why. Or perhaps you can point me toward some references? Thanks. |
i don't really understand why either, but empirically i've found this to be the case (vanilla SGD doesn't really work well with GRU). |
Hi, |
+1, would be interested in this for comparison purposes! |
Hi team. thanks for great work.
I'm currently trying to construct seq2seq model with Bahdanau style, featured by bidirectional encoder - decoder using GRU cell.
This repository, however, doesn't seem to have GRU implementation, which now i'm trying to add.
(If you already have, it would be a lot of help for time saving..!)
Is it just the models.lua/make_lstm function that i need to modify?
And one more question is about bidirectional encoder:
The description of bidirectional encoder options' saying: "hidden states of the corresponding forward/backward LSTMs are added to obtain the hidden representation for that time step."
does it mean it literally adds two hidden states, not the normal concatenation scheme?
and if so, would it be compatible to other codes if i simply change the hidden representation into the concatenation of two hidden states?
thanks for the reply in advance
The text was updated successfully, but these errors were encountered: