You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Is your feature request related to a problem? Please describe.
my truly minimal use case request: I have 2 datasets with resolutions 256 and 512, I want to build 2 dataloaders, one dataloader load
256x256 images with batch size 8, one load 512x512 images batch size 2.
it comes to a conflict with the note in the documentation:
Note: train_batch_size must be equal to train_micro_batch_size_per_gpu * gradient_accumulation_steps * number of GPUs. For simplicity, you can choose to only specify two of the three parameters, the last one will be inferred automatically by DeepSpeed.
so how to decide the train_micro_batch_size_per_gpu?
that is comes to a grounding problem: how deepspeed process gradient accumulation?
judge the forward access of the model, no matter the batch size is what, the gradient accumulation is logical right
judge the instance number, namely the number of data instances go through the model, such as if instances reaches to 32 to perform optimization, so for 512x512 and 256X256 data, the forward times will be different, and cause a logical promblem with mixed batch size training.
Describe the solution you'd like
describe how deepspeed process gradient accumulation in the document
it is better to judge the forward access of the model to perform gradient accumulation
unlock the limitation of train_batch_size must be equal to train_micro_batch_size_per_gpu * gradient_accumulation_steps * number of GPUs
thank you for your great work.
The text was updated successfully, but these errors were encountered:
No core functionality of DeepSpeed needs batch size information, and so the restriction on batch size, gradient accumulation, and GPU can be relaxed/eliminated.
Can you share more details of your dynamic batch size scenario. For example, is it similar to curriculum learning, which scales batch size and sequence length dynamically and works with DeepSpeed.
Can you share a repro for the error you are seeing? In particular, it would be good to see your ds_config and deepspeed.intialize call. There might be a simple workaround similar to the HF integration.
Is your feature request related to a problem? Please describe.
my truly minimal use case request: I have 2 datasets with resolutions 256 and 512, I want to build 2 dataloaders, one dataloader load
256x256 images with batch size 8, one load 512x512 images batch size 2.
it comes to a conflict with the note in the documentation:
so how to decide the train_micro_batch_size_per_gpu?
that is comes to a grounding problem: how deepspeed process gradient accumulation?
Describe the solution you'd like
thank you for your great work.
The text was updated successfully, but these errors were encountered: