-
Notifications
You must be signed in to change notification settings - Fork 25
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Quality of life and helper callback functions #237
Conversation
I have somehow broken |
I think I have a lead on what the issue is: because of how This needs a bit of thought to fix... |
Confirming this by changing out the if "embeddings" in batch:
embeddings = batch.get("embeddings")
else:
embeddings = self.encoder(batch)
batch["embeddings"] = embeddings
outputs = self.process_embedding(embeddings)
return outputs Removing the branch, and just running the encoder + processing embeddings works (i.e. don't try and grab cached embeddings). Ideally there would be a way to check if embeddings originated from the same computational graph, but that take a lot more surgery than this PR warrants. I'll think of an alternative to this. The reason we are stashing the embeddings is to benefit the multitask case, where we would want to not have to run the encoder X times for X tasks and datasets. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some great features here, thanks for doing all of this! I threw in a few comments, I know you're still working on things.
Signed-off-by: Lee, Kin Long Kelvin <[email protected]>
Signed-off-by: Lee, Kin Long Kelvin <[email protected]>
That way we don't do a double log as forward might be called multiple times
Signed-off-by: Lee, Kin Long Kelvin <[email protected]>
Signed-off-by: Lee, Kin Long Kelvin <[email protected]>
This will only still function for wandb/tensorboard, but supports multiple
…ng them This addresses the issue of computational graphs breaking
aeab6cd
to
7f726b9
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will bring some great utilities and helpful debugging tools! Looks good to merge.
This PR introduces and adds a bunch of changes pertaining to informing the user of things happening under the hood, particularly during training.
One of the big philosophical changes is also focusing more on enabling logging to be done with
TensorBoardLogger
andWandbLogger
s by writing functions more tailored to them, rather than before where loggers were treated in the abstract entirely.Summary
log_embeddings
andlog_embeddings_every_n_steps
arguments that are saved tohparams
, which as the pair suggests, allow you to regularly log embedding vectors for analysis. This will let you ensure oversmoothing doesn't occur, where all of the embedding features become identical.TrainingHelperCallback
, which is intended to help diagnose some common issues with training, such as unused parameters, missing gradients, tiny gradients, etc. Complimentary to the change above, there is an option to inject a forward hook to any encoder (assuming it produces anEmbeddings
structure), and uses it to calculate the variance in embeddings.ModelAutocorrelation
callback, which will perform an autocorrelation analysis on model parameters and gradients over the course of training. Basically this gives you some insight into how the training dynamics appear, i.e. too much correlation = probably not good.My intention for the
TrainingHelperCallback
is to be like a guide for best practices: we can refine this as we go and discover new things, and hopefully will be useful for everyone including new users.