an example of a word-level language model using FluxML #357

jaffourt · 2022-05-15T17:40:38Z

No description provided.

ToucheSir · 2022-05-17T14:35:04Z

text/word-rnn/word-rnn.jl

+                grad_x = clamp!(gradient[x], -args.clip, args.clip)
+                # backprop
+                x .-= lr .* grad_x


In the spirit of promoting best practices, we have https://fluxml.ai/Flux.jl/stable/training/optimisers/#Flux.Optimise.ClipValue and the rest of https://fluxml.ai/Flux.jl/stable/training/optimisers/ for this. I imagine you'd want to use something more sophisticated than plain SGD anyhow.

@ToucheSir I opted out of the library methods as this explicit calculation provided better performance. However, for the promotion of best practices I agree and I can switch to using the library methods.

ToucheSir · 2022-05-17T14:36:09Z

text/word-rnn/word-rnn.jl

+
+    # logit cross entropy loss function
+    function loss(x, y)
+        Flux.reset!(model)


Would highly recommend moving this outside of the loss function (i.e. into the training loop).

@ToucheSir Can you elaborate on this? My understanding is that the loss function is called for each batch in a single training loop, thus for a sequential language model such as this one we want to reset the hidden state after each batch.

I presume the suggestion is to keep it inside the for batch in data_loader loop, but outside the gradient call.

text/word-rnn/word-rnn.jl

ToucheSir · 2022-05-17T14:37:29Z

text/word-rnn/word-rnn.jl

+    hold_out = zip(x_train[end-5:end], y_train[end-5:end])
+
+    # used for updating hyperparameters
+    best_val_loss = nothing


Suggested change

best_val_loss = nothing

local best_val_loss

Co-authored-by: Brian Chen <[email protected]>

project files, and code for word level language model

553c446

ToucheSir reviewed May 17, 2022

View reviewed changes

text/word-rnn/word-rnn.jl Outdated Show resolved Hide resolved

ToucheSir reviewed May 17, 2022

View reviewed changes

remove confusing comment

3ce5c67

Co-authored-by: Brian Chen <[email protected]>

mcabbott added the new model new arrival in the zoo label Dec 11, 2022

Add README

dbe3659

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

an example of a word-level language model using FluxML #357

an example of a word-level language model using FluxML #357

jaffourt commented May 15, 2022

ToucheSir May 17, 2022

jaffourt Dec 13, 2022

ToucheSir May 17, 2022

jaffourt Dec 13, 2022

mcabbott Dec 13, 2022

ToucheSir May 17, 2022

an example of a word-level language model using FluxML #357

Are you sure you want to change the base?

an example of a word-level language model using FluxML #357

Conversation

jaffourt commented May 15, 2022

ToucheSir May 17, 2022

Choose a reason for hiding this comment

jaffourt Dec 13, 2022

Choose a reason for hiding this comment

ToucheSir May 17, 2022

Choose a reason for hiding this comment

jaffourt Dec 13, 2022

Choose a reason for hiding this comment

mcabbott Dec 13, 2022

Choose a reason for hiding this comment

ToucheSir May 17, 2022

Choose a reason for hiding this comment