[REQUEST] Dynamic model offload support ZeRO-3 inference models #6595

kfertakis · 2024-10-01T14:18:48Z

Is your feature request related to a problem? Please describe.
The issue is related to #5620 and #6011. When having a deespeed model initialised for ZeRO-3 inference, with a DeepSpeedZeRoOffload optimizer for example, the model cannot be moved to the CPU either by using the torch.nn.module.to() functionality or with the new offload_states API.

Describe the solution you'd like
Either extend #6011 to support offload of a model configured for ZeRO-3 inference or a new API that supports this.

Thanks

The text was updated successfully, but these errors were encountered:

tjruwase · 2024-10-01T14:24:15Z

@kfertakis, can you please clarify you ask here since:

ZeRO-Inference does not include optimizer state
ZeRO-Inference hosts model weights in CPU or NVMe normally.

tjruwase · 2024-10-01T14:26:47Z

It might be helpful to use example log/screenshots from the following to demonstrate the problem:
https://github.com/microsoft/DeepSpeedExamples/blob/master/inference/huggingface/zero_inference/README.md

Thanks!

kfertakis · 2024-10-01T15:21:50Z

@tjruwase, thanks for the example reference. You're right I should clarify a bit better. The issue does not refer to optimizer states rather weights for a ZeRO-Inference model initially placed in GPU memory.

Indeed, if you configure ZeRO-Inference to use CPU to host model weights at the initialisation time, as with --cpu-offload option in the example code, GPU memory will not be used. However, the issue I am referring to is when the model is initially placed into GPU memory (no --cpu-offload flag in the example) and then there is a need to dynamically move it at runtime to CPU memory, the same that the offload_states (#6011) API accomplishes, disregarding the optimizer state that is not relevant in this case. Using the torch.nn.module.to() or offload_states functionality does not move the deepspeed initialised ZeRO-Inference model to CPU memory.

Thanks.

kfertakis added the enhancement New feature or request label Oct 1, 2024

kfertakis changed the title ~~[REQUEST] Extend offload_states API to support ZeRO-3 inference models~~ [REQUEST] Extend offload_states API to support ZeRO-3 inference models Oct 1, 2024

kfertakis mentioned this issue Oct 1, 2024

Add APIs to offload states of model, optimizer, and engine #6011

Merged

kfertakis changed the title ~~[REQUEST] Extend offload_states API to support ZeRO-3 inference models~~ [REQUEST] Dynamic model offload support ZeRO-3 inference models Oct 1, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[REQUEST] Dynamic model offload support ZeRO-3 inference models #6595

[REQUEST] Dynamic model offload support ZeRO-3 inference models #6595

kfertakis commented Oct 1, 2024

tjruwase commented Oct 1, 2024

tjruwase commented Oct 1, 2024

kfertakis commented Oct 1, 2024

[REQUEST] Dynamic model offload support ZeRO-3 inference models #6595

[REQUEST] Dynamic model offload support ZeRO-3 inference models #6595

Comments

kfertakis commented Oct 1, 2024

tjruwase commented Oct 1, 2024

tjruwase commented Oct 1, 2024

kfertakis commented Oct 1, 2024