Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[REQUEST] Dynamic model offload support ZeRO-3 inference models #6595

Open
kfertakis opened this issue Oct 1, 2024 · 3 comments
Open

[REQUEST] Dynamic model offload support ZeRO-3 inference models #6595

kfertakis opened this issue Oct 1, 2024 · 3 comments
Labels
enhancement New feature or request

Comments

@kfertakis
Copy link

Is your feature request related to a problem? Please describe.
The issue is related to #5620 and #6011. When having a deespeed model initialised for ZeRO-3 inference, with a DeepSpeedZeRoOffload optimizer for example, the model cannot be moved to the CPU either by using the torch.nn.module.to() functionality or with the new offload_states API.

Describe the solution you'd like
Either extend #6011 to support offload of a model configured for ZeRO-3 inference or a new API that supports this.

Thanks

@kfertakis kfertakis added the enhancement New feature or request label Oct 1, 2024
@kfertakis kfertakis changed the title [REQUEST] Extend offload_states API to support ZeRO-3 inference models [REQUEST] Extend offload_states API to support ZeRO-3 inference models Oct 1, 2024
@tjruwase
Copy link
Contributor

tjruwase commented Oct 1, 2024

@kfertakis, can you please clarify you ask here since:

  1. ZeRO-Inference does not include optimizer state
  2. ZeRO-Inference hosts model weights in CPU or NVMe normally.

@tjruwase
Copy link
Contributor

tjruwase commented Oct 1, 2024

It might be helpful to use example log/screenshots from the following to demonstrate the problem:
https://github.com/microsoft/DeepSpeedExamples/blob/master/inference/huggingface/zero_inference/README.md

Thanks!

@kfertakis
Copy link
Author

@tjruwase, thanks for the example reference. You're right I should clarify a bit better. The issue does not refer to optimizer states rather weights for a ZeRO-Inference model initially placed in GPU memory.

Indeed, if you configure ZeRO-Inference to use CPU to host model weights at the initialisation time, as with --cpu-offload option in the example code, GPU memory will not be used. However, the issue I am referring to is when the model is initially placed into GPU memory (no --cpu-offload flag in the example) and then there is a need to dynamically move it at runtime to CPU memory, the same that the offload_states (#6011) API accomplishes, disregarding the optimizer state that is not relevant in this case. Using the torch.nn.module.to() or offload_states functionality does not move the deepspeed initialised ZeRO-Inference model to CPU memory.

Thanks.

@kfertakis kfertakis changed the title [REQUEST] Extend offload_states API to support ZeRO-3 inference models [REQUEST] Dynamic model offload support ZeRO-3 inference models Oct 1, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants