Allow any model architecture #3

ClashLuke · 2023-01-15T20:44:23Z

No description provided.

# Conflicts: # locoprop/trainer.py

ClashLuke · 2023-01-15T21:26:07Z

Works well. example notebook

dvruette

Looks great! Ideally we would update at least the test/autoencoder.py example (and the README) for future reference.

I've left some comments where I didn't quite understand.

Another thing I've noticed yesterday is that torch.autograd.grad doesn't work with integer tensors as inputs, which happens e.g. when using nn.Embedding. Not sure if this is easily resolvable, so we don't need to address it now. But something to keep in mind for the future.

locoprop/layer.py

Co-authored-by: Dimitri <[email protected]>

dvruette · 2023-01-18T10:54:29Z

While training seems to work with Lightning, the standard training loop breaks for some reason: https://colab.research.google.com/drive/1hNLavl5jYgf7-DxfmnTucyCAcMHS4y4c?usp=sharing

Don't really know what's going on, the only difference is the training loop AFAICT.

ClashLuke · 2023-02-13T06:42:55Z

We get roughly the same convergence with an outer learning rate of 0.01. Look at this copy of your notebook.
Does the lightning trainer do some gradient scaling internally?

ClashLuke added 6 commits January 15, 2023 20:00

feat(locoprop): use lightning for trainer

86b489a

feat(locoprop): use lightning for trainer v2

672227a

style(locoprop): set optimizer params in LocopropTrainer

c3a3778

style(locoprop): remove PL dependency

7843413

Merge branch 'main' into dual-forward

6119b41

# Conflicts: # locoprop/trainer.py

merge main

7a30fac

ClashLuke marked this pull request as ready for review January 15, 2023 21:15

ClashLuke changed the title ~~WIP: Allow any model architecture~~ Allow any model architecture Jan 15, 2023

ClashLuke requested a review from dvruette January 16, 2023 12:25

dvruette reviewed Jan 16, 2023

View reviewed changes

locoprop/layer.py Outdated Show resolved Hide resolved

locoprop/layer.py Show resolved Hide resolved

locoprop/layer.py Outdated Show resolved Hide resolved

ClashLuke and others added 5 commits January 16, 2023 14:12

style(layer): comment _IS_TRAINING

b859a48

Co-authored-by: Dimitri <[email protected]>

fix(layer): only take std within sample

b0e0d2a

fix correction magnitude

e28e425

feat(layer): remove eps

1ab679b

fix(trainer): dont set removed eps

76ddf30

ClashLuke added 11 commits March 25, 2024 13:51

test(locolayer): use lerp

255fff4

test(model): add LocoLinear

d3179b4

fix(trainer): correct IS_BASELINE usage

4c10742

style: add example notebook

4ab73a0

fix(layer): batchsize invariant

8bcea2f

perf(locoprop): use different optimizer, add torch flags

281da8a

perf(layer): use zero memory tensor

1efe641

style(example): more notebook to python script

186220b

fix(layer): don't use torch fn in unwrap

71b7056

fix(layer): allow normal loco

f600a22

fix(layer): now

2a6e8e1

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow any model architecture #3

Allow any model architecture #3

ClashLuke commented Jan 15, 2023

ClashLuke commented Jan 15, 2023 •

edited

Loading

dvruette left a comment

dvruette commented Jan 18, 2023

ClashLuke commented Feb 13, 2023

Allow any model architecture #3

Are you sure you want to change the base?

Allow any model architecture #3

Conversation

ClashLuke commented Jan 15, 2023

ClashLuke commented Jan 15, 2023 • edited Loading

dvruette left a comment

Choose a reason for hiding this comment

dvruette commented Jan 18, 2023

ClashLuke commented Feb 13, 2023

ClashLuke commented Jan 15, 2023 •

edited

Loading