You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The config of the tokenizer employed during packing / tokenization of the data (modalities data pack_encoded_data) has an attribute "model_max_length", see here.
While this attribute is not used in any way during packing and thus doesn't have any effect on the results (i.e. the pbin file), it throws many warnings like "Token indices sequence length is longer than the specified maximum sequence length for this model (1757 > 1024)".
We should avoid those warnings by setting "model_max_length" to a very large value (e.g. in the Tokenizer object).
The text was updated successfully, but these errors were encountered:
The config of the tokenizer employed during packing / tokenization of the data (
modalities data pack_encoded_data
) has an attribute"model_max_length"
, see here.While this attribute is not used in any way during packing and thus doesn't have any effect on the results (i.e. the pbin file), it throws many warnings like "Token indices sequence length is longer than the specified maximum sequence length for this model (1757 > 1024)".
We should avoid those warnings by setting "model_max_length" to a very large value (e.g. in the
Tokenizer
object).The text was updated successfully, but these errors were encountered: