You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In my model implementation, I would like to freeze the transformer (using roberta-base in a Tok2VecTransformer.v1) for the first 2 epochs during training. From this spacy documentation, it seems like it should be possible to set the grad_factor to 0 in order to disable gradients from one of the listeners. Setting this up per epoch should then be possible, according to the same documentation, by using a scheduler. In my config, I have specified the constant_then scheduler followed by another constant scheduler in the following way:
=========================== Initializing pipeline ===========================
✘ Config validation error
seq2labels.model.tok2vec -> grad_factor value is not a valid float
It seems to me that the scheduler may be returning and iterator instead of a float that can be used as a value here. Have I overlooked some aspect that should still be implemented/ammended?
Otherwise, if this scheduler does not work with grad_factor, is there another way to freeze the transformer only for the first 2 epochs of training?
Thanks for any help in advance :)
The text was updated successfully, but these errors were encountered:
This is basically because grad_factor isn't designed to take a sequence of values, like an iterator, as you note. That's not just an oversight, the transformers model isn't designed to support a sequence there at the moment.
If you look at a place where the value can be a sequence or float, like the learn rate in Adam, you'll see that the type is annotated as FloatOrSeq. In contrast, grad_factor is just a float.
This also isn't just a type issue - the implementation of the Transformer architecture would need to be changed to work with non-constant values. Looking at it I don't think it would be complicated.
I've wanted this feature myself when training models before, so I think we could certainly consider adding it.
In my model implementation, I would like to freeze the transformer (using
roberta-base
in a Tok2VecTransformer.v1) for the first 2 epochs during training. From this spacy documentation, it seems like it should be possible to set thegrad_factor
to 0 in order to disable gradients from one of the listeners. Setting this up per epoch should then be possible, according to the same documentation, by using a scheduler. In my config, I have specified theconstant_then
scheduler followed by anotherconstant
scheduler in the following way:When initializing, I get the following error:
It seems to me that the scheduler may be returning and iterator instead of a float that can be used as a value here. Have I overlooked some aspect that should still be implemented/ammended?
Otherwise, if this scheduler does not work with
grad_factor
, is there another way to freeze the transformer only for the first 2 epochs of training?Thanks for any help in advance :)
The text was updated successfully, but these errors were encountered: