Why is `pad_to_sequence_len: true recommended when using sample_packing? #1101
-
As far as I can see, when using sample_packing the effective batch size is 1, and the inputs are padded to a multiple of 64. Beyond this padding, why is it recommended to pad to the sequence len? |
Beta Was this translation helpful? Give feedback.
Answered by
winglian
Jan 11, 2024
Replies: 2 comments
-
This is due to the fact that feeding varying batch sizes to the gpu a ten leeds to gpu memory actually increasing from having varying lengths even though they are still smaller than a fully padded batch. |
Beta Was this translation helpful? Give feedback.
0 replies
Answer selected by
RicardoDominguez
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
This is due to the fact that feeding varying batch sizes to the gpu a ten leeds to gpu memory actually increasing from having varying lengths even though they are still smaller than a fully padded batch.