Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is multi-gpu training available? #352

Open
limapedro opened this issue Oct 12, 2021 · 3 comments
Open

Is multi-gpu training available? #352

limapedro opened this issue Oct 12, 2021 · 3 comments

Comments

@limapedro
Copy link

So, the support for tensorflow 1.x seems to be almost complete, is the function multi_gpu from utils working? This is something that I'm looking forward, any information on this would be extremely important.

Would something like this work?

model = multi_gpu_model(model, gpus=2)

@jstoecker
Copy link
Contributor

The tf.keras.utils.multi_gpu_model helper function will not work with DML devices because the helpers are hard-coded to look for "gpu" devices. Since "gpu" effectively implies CUDA in TF1, changing this simple string has far-reaching implications in numerous helpers (unfortunately). It may be possible to modify this or other helpers to support "dml" devices but it is not something we planned for or tested.

You can build graphs and explicitly assign portions to different "dml" devices in TF 1.15 (e.g. using tf.device), but full coverage of all the various multi- or distributed-GPU features in TF1 isn't complete. We decided to focus our efforts on the TF2 pluggable device model first: if we do support multi-GPU more robustly it will likely be in TF2 where the core runtime is built with additional non-CPU/CUDA plugin-based device backends in mind.

@zhmlcg
Copy link

zhmlcg commented Dec 27, 2021

@jstoecker: Do you have any schedule of when to finish a complete support of TF2 and in turn of multi- or distributed-GPU? Is it possible to have it at the end of 2022?

@jstoecker
Copy link
Contributor

@jstoecker: Do you have any schedule of when to finish a complete support of TF2 and in turn of multi- or distributed-GPU? Is it possible to have it at the end of 2022?

We intend to have a TF2 package released in the next few months with comparable functional coverage to what's in TF1, but it's still too early to say if multi-gpu will follow immediately after this package. Adding @PatriceVignola to this discussion in case anything has or will change on prioritization here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants