Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

unfreezing 🥶 weights callback #297

Open
sebffischer opened this issue Oct 18, 2024 · 0 comments
Open

unfreezing 🥶 weights callback #297

sebffischer opened this issue Oct 18, 2024 · 0 comments
Assignees

Comments

@sebffischer
Copy link
Member

sebffischer commented Oct 18, 2024

When finetuning a predefined image network on a downstream task, one often wants to freeze some weights for a given number of epochs/steps. As this is relatively common, we should offer a predefined callback ("cb.freeze") to do enable this.

The callback should be able to iteratively unfreeze layers after a given number of epochs / batches.

Background:

Each torch module represents its parameters as a named list():

net = torch::nn_linear(1, 1)

net$parameters
#> $weight
#> torch_tensor
#>  0.3468
#> [ CPUFloatType{1,1} ][ requires_grad = TRUE ]
#> 
#> $bias
#> torch_tensor
#>  0.6796
#> [ CPUFloatType{1} ][ requires_grad = TRUE ]

When we want to unfreeze a specific weight, we can refer to it via its name in this list.
Further, we can freeze a parameter in a network by setting its $requires_grad field to FALSE:

net$parameters[[1]]$requires_grad
#> TRUE
net$parameters[[1]]$requires_grad_(FALSE)
net$parameters[[1]]$requires_grad
#> FALSE

We can unfreeze a parameter the same way:

net$parameters[[1]]$requires_grad_(TRUE)
net$parameters[[1]]$requires_grad
#> TRUE

The callback needs to define

  1. when to unfreeze which layer (the when should be definable both in terms of epochs and batches).
    It should e.g. be possible to unfreeze layer8 after the first epoch, layer7 after the third, and the rest after the third epoch.
  2. which weights are freezed at the start / which weights are trainable from the start.

I can e.g. imagine this callback to have the parameters:

  • start :: ASelector (see the affect_columns parameter in mlr3pipelines. that defines which weights will be trained from the start (maybe a better name exists for the parameter).
  • unfreeze: a data.table() with column weights (a list() column containing Selector) and a column epoch OR batch.
    If we had something like:
    unfreeze = data.table(
      epoch = c(1, 2)
      weights = list(selector_name("some_layer"), selector_invert(selector_name("last_layer")))
    )
    this should be interpreted as unfreezing the module$parameters$some_layer after the first epoch and the rest after the second layer.
    If the name in the data.table is "batch" instead of "epoch", this should work just the same but after n batches instead of epochs
@sebffischer sebffischer changed the title 🥶 weights callback freezing 🥶 weights callback Oct 18, 2024
@sebffischer sebffischer changed the title freezing 🥶 weights callback unfreezing 🥶 weights callback Oct 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants