-
Notifications
You must be signed in to change notification settings - Fork 201
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Query device name from pytorch if only device index is given #500
Conversation
@Narsil : can you, please, help to review? |
Rebased on top of latest main. @Narsil, @ArthurZucker, @SunMarc : can you, please, help to review? |
Resolved conflict with 2331974. @Narsil @muellerzr @SunMarc : can this PR, please, be reviewed? |
@dvrogozh Can you stop calling it a bug everywhere, since it's not a bug, it's breaking change you are proposing, that you introduced in torch==2.5. The new behavior may be more user friendly on non cuda accelerators, it is nonetheless a breaking change and should be treated as such. This PR introduces a dependency on torch itself since we would be dependent on torch to produce the correct string, therefore I cannot take the code as-is. The raison d'être of this code is to provide simple validation so |
I made a cleaner implementation imho: #509 Can you check that it fixes your issue ? You're also more than welcome to steal the code from said PR so we can merge your PR instead of mine, so you get credit. |
Co-authored-by: Dmitry Rogozhkin <[email protected]>
Fixes: huggingface#499 Fixes: huggingface/transformers#31941 In some cases only device index is given on querying device. In this case both PyTorch and Safetensors were returning 'cuda:N' by default. This is causing runtime failures if user actually runs something on non-cuda device and does not have cuda at all. Recently this was addressed on PyTorch side by [1]: starting from PyTorch 2.5 calling 'torch.device(N)' will return current device instead of cuda device. This commit is making similar change to Safetensors. If only device index is given, Safetensors will query and return device calling 'torch.device(N)'. This change is backward compatible since this call would return 'cuda:N' on PyTorch <=2.4 which aligns with previous Safetensors behavior. See[1]: pytorch/pytorch#129119 Signed-off-by: Dmitry Rogozhkin <[email protected]>
Unfortunately it does not. Fails with "RuntimeError: Invalid device string: '0'". See pytorch part of the stack at #509 (review) |
@Narsil : I reworked my PR on top of your proposal made in #509 to abstract device string parsing (add |
The logic you kept is the logic I want to get rid of. It's still wrong to depend on torch internals (here the string representation of resolved device) on that specific part. |
I am fine with this as soon as your change address the problem. I gave a try to modified version of #509, it works for me now. We can proceed with yours variant if you believe it's more aligned with the safetensors design. |
Perfect done. Thanks again for raising awareness about the upcoming new behavior ! |
Superseeded by #509 |
Fixes: #499
Fixes: huggingface/transformers#31941
In some cases only device index is given on querying device. In this case both PyTorch and Safetensors were returning 'cuda:N' by default. This is causing runtime failures if user actually runs something on non-cuda device and does not have cuda at all. Recently this was addressed on PyTorch side by 1: starting from PyTorch 2.5 calling 'torch.device(N)' will return current device instead of cuda device.
This commit is making similar change to Safetensors. If only device index is given, Safetensors will query and return device calling 'torch.device(N)'. This change is backward compatible since this call would return 'cuda:N' on PyTorch <=2.4 which aligns with previous Safetensors behavior.
CC: @guangyey @jgong5 @faaany @muellerzr @SunMarc @Narsil