Is multi-gpu training available? #352

limapedro · 2021-10-12T21:43:01Z

So, the support for tensorflow 1.x seems to be almost complete, is the function multi_gpu from utils working? This is something that I'm looking forward, any information on this would be extremely important.

Would something like this work?

model = multi_gpu_model(model, gpus=2)

jstoecker · 2021-10-19T21:08:01Z

The tf.keras.utils.multi_gpu_model helper function will not work with DML devices because the helpers are hard-coded to look for "gpu" devices. Since "gpu" effectively implies CUDA in TF1, changing this simple string has far-reaching implications in numerous helpers (unfortunately). It may be possible to modify this or other helpers to support "dml" devices but it is not something we planned for or tested.

You can build graphs and explicitly assign portions to different "dml" devices in TF 1.15 (e.g. using tf.device), but full coverage of all the various multi- or distributed-GPU features in TF1 isn't complete. We decided to focus our efforts on the TF2 pluggable device model first: if we do support multi-GPU more robustly it will likely be in TF2 where the core runtime is built with additional non-CPU/CUDA plugin-based device backends in mind.

zhmlcg · 2021-12-27T09:19:24Z

@jstoecker: Do you have any schedule of when to finish a complete support of TF2 and in turn of multi- or distributed-GPU? Is it possible to have it at the end of 2022?

jstoecker · 2022-01-10T16:27:15Z

@jstoecker: Do you have any schedule of when to finish a complete support of TF2 and in turn of multi- or distributed-GPU? Is it possible to have it at the end of 2022?

We intend to have a TF2 package released in the next few months with comparable functional coverage to what's in TF1, but it's still too early to say if multi-gpu will follow immediately after this package. Adding @PatriceVignola to this discussion in case anything has or will change on prioritization here.

obriensystems mentioned this issue Oct 1, 2023

tensorflow on OSX Mac M1 pro/max silicon 32 cores and windows 11 13900k with dual RTX-A4500/A4000 workstation and dual GTX-4090 consumer ObrienlabsDev/blog#13

Open

obriensystems mentioned this issue Nov 18, 2024

tensorflow metal on M4 Pro ObrienlabsDev/machine-learning#30

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Is multi-gpu training available? #352

Is multi-gpu training available? #352

limapedro commented Oct 12, 2021

jstoecker commented Oct 19, 2021

zhmlcg commented Dec 27, 2021

jstoecker commented Jan 10, 2022

Is multi-gpu training available? #352

Is multi-gpu training available? #352

Comments

limapedro commented Oct 12, 2021

jstoecker commented Oct 19, 2021

zhmlcg commented Dec 27, 2021

jstoecker commented Jan 10, 2022